简体   繁体   中英

Control Number of Processes in Python using multiprocessing

I would like to control the number of Processes spawned while using the multiprocessing package.

Say I only want three processes active at the same time. The only way I know how to do this is:

import multiprocessing
import Queue
def worker(arg):
    ## Do stuff
    return returnvalue

argument = list(1,2,3,4,5,6)
aliveprocesses = 0
jobs = Queue.Queue()
for arg in argument:
    while jobs.qsize() > 2:
        jobs.get().join()
    p = multiprocessing.Process(target=worker,args=(arg,))
    jobs.put(p)
    p.start()

Basically I only know how to monitor one process at a time using the Process.join() function. I monitor the oldest process until it is done and then create a new process. For my program the oldest process should finish before the others, on average. But who knows? Maybe another process finishes first and I would have no way of knowing.

The only alternative I can think of is something like this:

import multiprocessing
import time
def worker(arg):
    ## Do stuff
    return returnvalue

argument = list(1,2,3,4,5,6)
aliveprocesses = 0
jobs = set()
for arg in argument:
    while aliveprocesses > 2:
        for j in jobs:
            if not j.is_alive():
                aliveprocesses -= 1
                break
            time.sleep(1)
    p = multiprocessing.Process(target=worker,args=(arg,))
    jobs.put(p)
    p.start()
    aliveprocesses += 1

In the above function you are checking all of processes if they are still alive. If they are all still alive you sleep for a bit and then check again until there is a dead process after which you spawn a new process. The problem here is that from what I understand the time.sleep() function is not a particularly efficient way to wait for a process to end.

Ideally I would like a function "superjoin()" like Process.join() only it uses a set of Process objects and when one Process within the set returns then superjoin() returns. And superjoin() does not itself use the time.sleep() function ie it's not being "passed the buck"

Since you seem to have a single (parallel) task, instead of managing processes individually, you should use the higher-level multiprocessing.Pool , which makes managing the number of processes easier.

You can't join a pool, but you have blocking calls (such as Pool.map ) that perform this kind of task.

If you need finer-grained control, you may want to adapt Pool's source code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM