简体   繁体   中英

Using multiprocessing with runpy

I have a Python module that uses multiprocessing . I'm executing this module from another script with runpy . However, this results in (1) the module running twice , and (2) the multiprocessing jobs never finish (the script just hangs).

In my minimal working example, I have a script runpy_test.py :

import runpy

and a directory module_test containing an empty __init__.py and a __main__.py :

from multiprocessing import Pool

print 'start'
def f(x):
    return x*x
pool = Pool()
result = pool.map(f, [1,2,3])
print 'done'

When I run runpy_test.py , I get:


and the script hangs.

If I remove the pool.map call (or if I run __main__.py directly, including the pool.map call), I get:


I'm running this on Scientific Linux 7.6 in Python 2.7.5.

Try defining your function f in a separate module. It needs to be serialised to be passed to the pool processes, and then those processes need to recreate it, by importing the module it occurs in. However, the __main__.py file it occurs in isn't a module, or at least, not a well-behaved one. Attempting to import it would result in the creation of another Pool and another invocation of map, which seems like a recipe for disaster.

Rewrite your __main__.py like so:

from multiprocessing import Pool
from .implementation import f

print 'start'
pool = Pool()
result = pool.map(f, [1,2,3])
print 'done'

And then write an implementation.py (you can call this whatever you want) in which your function is defined:

def f(x):
    return x*x

Otherwise you will have the same problem with most interfaces in multiprocessing, and independently of using runpy. As @Weeble explained, when Pool.map tries to load the function f in each sub-process it will import <your_package>.__main__ where your function is defined, but since you have executable code at module-level in __main__ it will be re-executed by the sub-process.

Aside from this technical reason, this is also better design in terms of separation of concerns and testing. Now you can easily import and call (including for test purposes) the function f without running it in parallel.

Although not the "right" way to do it, one solution that ended up working for me was to use runpy's _run_module_as_main instead of run_module . This was ideal for me since I was working with someone else's code and required the fewest changes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM