Thursday, May 16, 2019

python parallel processing

[1]https://research.wmz.ninja/articles/2018/03/on-sharing-large-arrays-when-using-pythons-multiprocessing.html
[2]https://wiki.python.org/moin/ParallelProcessing

very basic principle:
Pool doesn't support pickling shared data through its argument list. That's what the error message means by "objects should only be shared between processes through inheritance". The shared data needs to be inherited, i.e., global if you want to share it using the Pool class.

Pickle! Oh my GOD!


import multiprocessing
import ctypes
import numpy as np

shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)

#-- edited 2015-05-01: the assert check below checks the wrong thing
#   with recent versions of Numpy/multiprocessing. That no copy is made
#   is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()

# Parallel processing
def my_func(i, def_param=shared_array):
    shared_array[i,:] = i

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    pool.map(my_func, range(10))

    print shared_array

Tuesday, May 7, 2019

make numpy array be shared by multiple processes

http://coding.derkeiler.com/Archive/Python/comp.lang.python/2008-09/msg00937.html

On Sep 10, 6:39 am, Travis Oliphant <oliphant.tra...@xxxxxxxx> wrote:

I wanted to point anybody interested to a blog post that describes a
useful pattern for having a NumPy array that points to the memory
created by a different memory manager than the standard one used by
NumPy.


Here is something similar I have found useful:

There will be a new module in the standard library called
'multiprocessing' (cf. the pyprocessing package in cheese shop). It
allows you to crerate multiple processes (as opposed to threads) for
concurrency on SMPs (cf. the dreaded GIL).

The 'multiprocessing' module let us put ctypes objects in shared
memory segments (processing.Array and processing.Value). It has it's
own malloc, so there is no 4k (one page) lower limit on object size.
Here is how we can make a NumPy ndarray view the shared memory
referencey be these objects:

try:
import processing
except:
import multiprocessing as processing

import numpy, ctypes

_ctypes_to_numpy = {
ctypes.c_char : numpy.int8,
ctypes.c_wchar : numpy.int16,
ctypes.c_byte : numpy.int8,
ctypes.c_ubyte : numpy.uint8,
ctypes.c_short : numpy.int16,
ctypes.c_ushort : numpy.uint16,
ctypes.c_int : numpy.int32,
ctypes.c_uint : numpy.int32,
ctypes.c_long : numpy.int32,
ctypes.c_ulong : numpy.int32,
ctypes.c_float : numpy.float32,
ctypes.c_double : numpy.float64
}

def shmem_as_ndarray( array_or_value ):

""" view processing.Array or processing.Value as ndarray """

obj = array_or_value._obj
buf = obj._wrapper.getView()
try:
t = _ctypes_to_numpy[type(obj)]
return numpy.frombuffer(buf, dtype=t, count=1)
except KeyError:
t = _ctypes_to_numpy[obj._type_]
return numpy.frombuffer(buf, dtype=t)

With this simple tool we can make processes created by multiprocessing
work with ndarrays that reference the same shared memory segment. I'm
doing some scalability testing on this. It looks promising :)



NumPy arrays with pre-allocated memory