简体   繁体   中英

Python multiprocessing using pickle, kwargs, and function references

I have a situation where I encounter problems using any efficent multiprocessing framework in python (no matter whether scoop or multiprocessing).

I have the following situation:

  1. One class 'Foo' which holds a function 'f'
  2. A second class 'Bar' which gets the arguments (kwargs) and holds an instance of class 'Foo' (containing the function)
  3. In 'Bar', the function of class Foo is executed using the arguments given.
  4. The results are, for reasons of statistical significance, averaged over multiple runs, in this case 10 times (not shown in given example).

Here is the example:

import multiprocessing as mp

class Foo:
    def __init__(self, f):
        self.f = f

class Bar:
    def __init__(self, foo, **kwargs):
        self.args = kwargs
        self.foo = foo

    def execute(self):
        pool = mp.Pool(5)
        f = lambda x : self.foo.f(**x)
        args = [self.args] * 10
        results = pool.map(f, args)

if __name__ == '__main__':
    def anything(**kwargs):
        print(kwargs['z'])
        return kwargs['x'] * kwargs['y']
    foo = Foo(anything)
    args = {'x':10, 'y':27, 'z':'Hello'}
    bar = Bar(**args)

I know that functions must be on module level in order to be pickable. Is there any way to be able to get the function pickable? Unfortunately, I am not very experiences in Python OOP, so probably I am missing an important point! Thank you!

EDIT: Unfortunately, even with using the module "multiprocess" which uses dill instead of pickle (thanks to Mike McKerns) it is not guaranteed that my problem is solved. For some short runs of my program, things are fine. For some reasons, multiprocess seems to generate race conditions as I get following error:

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/multiprocess/pool.py", line 389, in _handle_results
    task = get()
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 209, in loads
    return load(file)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 199, in load
    obj = pik.load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1096, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 353, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/lib/python2.7/pickle.py", line 1132, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Individual'

(Individual is a class which is used by my program [genetic algorithm using deap]) Any idea?

(repeating the comments above)

Substitute the multiprocess package for multiprocessing and the code should work with no other changes. This is because multiprocess is a fork of multiprocessing that uses dill instead of pickle … so you are able to serialize almost anything in python, including stuff you write in the interpreter session. That's the only change made for the fork of multiprocessing .

See https://stackoverflow.com/a/21345273/2379433 and https://stackoverflow.com/a/21345308/2379433 and https://stackoverflow.com/a/21345423/2379433 and etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM