简体   繁体   中英

Python multiprocessing: is it possible to have a pool inside of a pool?

I have a module A that does a basic map/reduce by taking data and sending it to modules B, C, D, etc for analysis and then joining their results together.

But it appears that modules B, C, D, etc cannot themselves create a multiprocessing pool, or else I get

AssertionError: daemonic processes are not allowed to have children

Is it possible to parallelize these jobs some other way?

For clarity, here's a(n admittedly bad) baby example. (I would normally try/catch but you get the gist.)

A.py:

  import B
  from multiprocessing import Pool

  def main():
    p = Pool()
    results = p.map(B.foo,range(10))
    p.close()
    p.join()
    return results


B.py:

  from multiprocessing import Pool

  def foo(x):
    p = Pool()
    results = p.map(str,x)
    p.close()
    p.join()
    return results

is it possible to have a pool inside of a pool?

Yes, it is possible though it might not be a good idea unless you want to raise an army of zombies . From Python Process Pool non-daemonic? :

import multiprocessing.pool
from contextlib import closing
from functools import partial

class NoDaemonProcess(multiprocessing.Process):
    # make 'daemon' attribute always return False
    def _get_daemon(self):
        return False
    def _set_daemon(self, value):
        pass
    daemon = property(_get_daemon, _set_daemon)

# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class Pool(multiprocessing.pool.Pool):
    Process = NoDaemonProcess

def foo(x, depth=0):
    if depth == 0:
        return x
    else:
        with closing(Pool()) as p:
            return p.map(partial(foo, depth=depth-1), range(x + 1))

if __name__ == "__main__":
    from pprint import pprint
    pprint(foo(10, depth=2))

Output

[[0],
 [0, 1],
 [0, 1, 2],
 [0, 1, 2, 3],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4, 5],
 [0, 1, 2, 3, 4, 5, 6],
 [0, 1, 2, 3, 4, 5, 6, 7],
 [0, 1, 2, 3, 4, 5, 6, 7, 8],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]

concurrent.futures supports it by default:

# $ pip install futures # on Python 2
from concurrent.futures import ProcessPoolExecutor as Pool
from functools import partial

def foo(x, depth=0):
    if depth == 0:
        return x
    else:
        with Pool() as p:
            return list(p.map(partial(foo, depth=depth-1), range(x + 1)))

if __name__ == "__main__":
    from pprint import pprint
    pprint(foo(10, depth=2))

It produces the same output.

Is it possible to parallelize these jobs some other way?

Yes. For example, look at how celery allows to create a complex workflow .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM