I have a function which I would like to be executed several times in parallel, but with only a defined number of instances at the same time.
The natural way to do this seems to be to use multiprocessing.Pool
. Specifically, the documentation says that
A frequent pattern (...) is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The
maxtasksperchild
argument to the Pool exposes this ability to the end user.
maxtasksperchild
is defined as:
maxtasksperchild
is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.
I am not clear what task means here. If I want to have, say, only up to 4 instances of my worker running in parallel should I initiate multiprocessing.Pool
as
pool = multiprocessing.Pool(processes=4, maxtasksperchild=4)
How processes
and maxtasksperchild
work together? Could I set processes
to 10 and still have only 4 workers running (effectively having 6 processes idle?)
As doc said (also in your describe),
processes is number of parallel worker could be run together, if not set, it will be the same as CPU number in your computer.
maxtasksperchild is max number of task that each process could deal with, that means if number of task finished achieves maxtasksperchild, that process will be killed and a new process will be started and added to Pool
Let me check the code:
def f(x):
print "pid: ", os.getpid(), " deal with ", x
sys.stdout.flush()
if __name__ == '__main__':
pool = Pool(processes=4, maxtasksperchild=2)
keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = pool.map(f, keys)
here we use 4 process, each of them will be killed after 2 tasks executed. After code executed, you could see:
pid: 10899 deal with 1
pid: 10900 deal with 2
pid: 10901 deal with 3
pid: 10899 deal with 5
pid: 10900 deal with 6
pid: 10901 deal with 7
pid: 10902 deal with 4
pid: 10902 deal with 8
pid: 10907 deal with 9
pid: 10907 deal with 10
processes [10899-10902] are killed after each of them executes 2 tasks, and a new process 10907 will be used to execute the last one.
As compare, if we use a larger maxtasksperchild or default value (which means process will never be killed and be alive as long as Pool), as the following code:
if __name__ == '__main__':
pool = Pool(processes=4, maxtasksperchild=10)
keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = pool.map(f, keys)
The result:
pid: 13352 deal with 1
pid: 13353 deal with 2
pid: 13352 deal with 4
pid: 13354 deal with 3
pid: 13353 deal with 6
pid: 13352 deal with 7
pid: 13355 deal with 5
pid: 13354 deal with 8
pid: 13353 deal with 9
pid: 13355 deal with 10
As you see, no new process created and all tasks are finished with the original 4 processes.
Wish this useful~
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.