简体   繁体   中英

Passing a function that accepts class member functions as variables into python multiprocess pool.map()

Hi I've been struggling with this for the better part of the morning and was hoping someone could point me in the right direction.

This is the code I have at the moment:

def f(tup):
    return some_complex_function(*tup)

def main():

    pool = Pool(processes=4) 
    #import and process data omitted 
    _args = [(x.some_func1, .05, x.some_func2) for x in list_of_some_class]
    results = pool.map(f, _args)
    print results

The first error I get is:

> Exception in thread Thread-2: Traceback (most recent call last):  
> File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
>     self.run()   File "/usr/lib/python2.7/threading.py", line 504, in run
>     self.__target(*self.__args, **self.__kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in
> _handle_tasks
>     put(task) PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

Any help would be very appreciated.

The multiprocess module uses the pickle module to serialize the arguments passed to the function ( f ), which is executed in another process.

Many of the built-in types can be pickled, but instance methods cannot be pickled. So .05 is fine, but x.some_func1 isn't. See What can be pickled and unpickled? for more details.

There's no simple solution. You'll need to restructure your program so instance methods don't need to be passed as arguments (or avoid using multiprocess ).

If you use a fork of multiprocessing called pathos.multiprocesssing , you can directly use classes and class methods in multiprocessing's map functions. This is because dill is used instead of pickle or cPickle , and dill can serialize almost anything in python.

pathos.multiprocessing also provides an asynchronous map function… and it can map functions with multiple arguments (eg map(math.pow, [1,2,3], [4,5,6]) )

See: What can multiprocessing and dill do together?

and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> 
>>> p = Pool(4)
>>> 
>>> def add(x,y):
...   return x+y
... 
>>> x = [0,1,2,3]
>>> y = [4,5,6,7]
>>> 
>>> p.map(add, x, y)
[4, 6, 8, 10]
>>> 
>>> class Test(object):
...   def plus(self, x, y): 
...     return x+y
... 
>>> t = Test()
>>> 
>>> p.map(Test.plus, [t]*4, x, y)
[4, 6, 8, 10]
>>> 
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]

Get the code here: https://github.com/uqfoundation/pathos

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM