简体   繁体   中英

Will multiprocessing.Process() or multiprocessing.Pool() distribute more evenly across cores?

Is there any difference at all (in any way) between creating a pool of processes, or simply looping over a process to create more processes?

What's the difference between this?:

pool = multiprocessing.Pool(5)
pool.apply_async(worker)
pool.join()

and this?:

procs = []
for j in range(5):
        p = multiprocessing.Process(worker)
        p.start()
        procs.append(p)

for p in procs:
    p.join()

Will pool be more likely to use more cores/processors?

The apply_async method of a pool will only run the worker function once, on an arbitrarily selected process from the pool, so your two code examples won't do exactly the same thing. To really be equivalent, you'd need to call apply_async five times.

I think which of the approaches is more appropriate to a give task depends a bit on what you are doing. multiprocessing.Pool allows you to do multiple jobs per process, which may make it easier to parallelize your program. For instance, if you have a million items that need individual processing, you can create a pool with a reasonable number of processes (perhaps as many as you have CPU cores) and then pass the list of the million items to pool.map . The pool will distribute them to the various worker processes (and collecting up the return values to be returned to the parent process). Launching a million separate processes would be much less practical (it would probably break your OS).

On the other hand, if you have a small number of jobs to do in parallel, and you only need each job done once, it may be perfectly reasonable to use a separate multiprocessing.Process for each job, rather than setting up a pool, launching the jobs then tearing down the pool.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM