I have a Python module that uses multiprocessing
. I'm executing this module from another script with runpy
. However, this results in (1) the module running twice , and (2) the multiprocessing
jobs never finish (the script just hangs).
In my minimal working example, I have a script runpy_test.py :
import runpy
runpy.run_module('module_test')
and a directory module_test containing an empty __init__.py and a __main__.py :
from multiprocessing import Pool
print 'start'
def f(x):
return x*x
pool = Pool()
result = pool.map(f, [1,2,3])
print 'done'
When I run runpy_test.py , I get:
start
start
and the script hangs.
If I remove the pool.map
call (or if I run __main__.py directly, including the pool.map
call), I get:
start
done
I'm running this on Scientific Linux 7.6 in Python 2.7.5.
Try defining your function f
in a separate module. It needs to be serialised to be passed to the pool processes, and then those processes need to recreate it, by importing the module it occurs in. However, the __main__.py
file it occurs in isn't a module, or at least, not a well-behaved one. Attempting to import it would result in the creation of another Pool and another invocation of map, which seems like a recipe for disaster.
Rewrite your __main__.py
like so:
from multiprocessing import Pool
from .implementation import f
print 'start'
pool = Pool()
result = pool.map(f, [1,2,3])
print 'done'
And then write an implementation.py
(you can call this whatever you want) in which your function is defined:
def f(x):
return x*x
Otherwise you will have the same problem with most interfaces in multiprocessing, and independently of using runpy. As @Weeble explained, when Pool.map
tries to load the function f
in each sub-process it will import <your_package>.__main__
where your function is defined, but since you have executable code at module-level in __main__
it will be re-executed by the sub-process.
Aside from this technical reason, this is also better design in terms of separation of concerns and testing. Now you can easily import and call (including for test purposes) the function f
without running it in parallel.
Although not the "right" way to do it, one solution that ended up working for me was to use runpy's _run_module_as_main
instead of run_module
. This was ideal for me since I was working with someone else's code and required the fewest changes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.