简体   繁体   中英

Multiprocessing in python with an unknown number of processors

This is probably a simple question, but after reading through documentation, blogs, and googling for a couple days, I haven't found a straightforward answer.

When using the multiprocessing module ( https://docs.python.org/3/library/multiprocessing.html ) in python, does the module distribute the work evenly between the number of given processors/cores?

More specifically, if I am doing development work on my local machine with four processors, and I write a function that uses multiprocessing to execute six functions, do three or four of them run in parallel and then the others run after something has finished? And, when I deploy it to production with six processors, do all six of those run in parallel?

I am trying to understand how much I need to direct the multiprocessing library. I have seen no direction in code samples, so I am assuming its handled. I want to be sure I can safely use this in multiple environments.

EDIT

After some comments, I wanted to clarify. I may be misunderstanding something.

I have several different functions I want to run at the same time. I want each of those functions to run on its own core. Speed is very important. My question is: "If I have five functions, and only four cores, how is this handled?"

Thank you.

The short answer is, if you don't specify a number of processes the default will be to spawn as many processes as your machine has cores, as indicated by multiprocessing.cpu_count() .

The long answer is that it depends on how you are creating the subprocesses...

If you create a Pool object and then use that with a map or starmap or similar function, that will create "cpu_count" number of processes as described above. Or you can use the processes argument to specify a different number of subprocesses to spawn. The map function will then distribute the work to those processes.

with multiprocessing.Pool(processes=N) as pool:
    rets = pool.map(func, args)

How the work is distributed by the map function can be a little complicated and you're best off reading the docs in detail if you're performance driven enough that you really care about chunking etc etc.

There are also other libraries that can help manage parallel processing at a higher level and have lots of options, such as Joblib and parmap . Again, best to read the docs.

If you specifically want to launch a number of processes equal to the number of jobs you have and don't care that it might be more than the number of cpus in the machine. You can use the Process object instead of the Pool object. This interface parallels the way the threading library can be used for concurrency.

ie

jobs = []
for _ in range(num_jobs):
    job = multiprocessing.Process(target=func, args=args)
    job.start()
    jobs.append(job)

# wait for them all to finish
for job in jobs:
    job.join()

Consider the above example pseudocode. You won't be able to copy paste that and expect it to work. Unless you're launching multiple instances of the same function with the same arguments of course.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM