简体   繁体   中英

How to limit the number of concurrent workers?

I have a function which I would like to be executed several times in parallel, but with only a defined number of instances at the same time.

The natural way to do this seems to be to use multiprocessing.Pool . Specifically, the documentation says that

A frequent pattern (...) is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user.

maxtasksperchild is defined as:

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

I am not clear what task means here. If I want to have, say, only up to 4 instances of my worker running in parallel should I initiate multiprocessing.Pool as

pool = multiprocessing.Pool(processes=4, maxtasksperchild=4)

How processes and maxtasksperchild work together? Could I set processes to 10 and still have only 4 workers running (effectively having 6 processes idle?)

As doc said (also in your describe),

processes is number of parallel worker could be run together, if not set, it will be the same as CPU number in your computer.

maxtasksperchild is max number of task that each process could deal with, that means if number of task finished achieves maxtasksperchild, that process will be killed and a new process will be started and added to Pool

Let me check the code:

def f(x):
    print "pid: ", os.getpid(), " deal with ", x
    sys.stdout.flush()

if __name__ == '__main__':
    pool = Pool(processes=4, maxtasksperchild=2)
    keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    result = pool.map(f, keys)

here we use 4 process, each of them will be killed after 2 tasks executed. After code executed, you could see:

pid:  10899  deal with  1
pid:  10900  deal with  2
pid:  10901  deal with  3
pid:  10899  deal with  5
pid:  10900  deal with  6
pid:  10901  deal with  7
pid:  10902  deal with  4
pid:  10902  deal with  8
pid:  10907  deal with  9
pid:  10907  deal with  10

processes [10899-10902] are killed after each of them executes 2 tasks, and a new process 10907 will be used to execute the last one.

As compare, if we use a larger maxtasksperchild or default value (which means process will never be killed and be alive as long as Pool), as the following code:

if __name__ == '__main__':
    pool = Pool(processes=4, maxtasksperchild=10)
    keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    result = pool.map(f, keys)

The result:

pid:  13352  deal with  1
pid:  13353  deal with  2
pid:  13352  deal with  4
pid:  13354  deal with  3
pid:  13353  deal with  6
pid:  13352  deal with  7
pid:  13355  deal with  5
pid:  13354  deal with  8
pid:  13353  deal with  9
pid:  13355  deal with  10

As you see, no new process created and all tasks are finished with the original 4 processes.

Wish this useful~

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM