简体   繁体   中英

How do I tell if Python's Multiprocessing module is using all of my cores for calculations?

I have some simple code from a tutorial like this:

from multiprocessing import Process, Lock
import os

def f(i):
    print 'hello world', i
    print 'parent process:', os.getppid()
    print 'process id:', os.getpid(), "\n\n"

if __name__ == '__main__':
    lock = Lock()

    for num in range(10):
        p = Process(target=f, args=(num,))
        p.start()
    p.join()

How can I tell if this is utilising both of my cores? Currently I'm running Ubuntu 11.04 w/ 3 GB RAM and Intel Core 2 Duo @ 2.2GHz.

The project I'm learning this for is going to be moved to a huge machine in somebody's office, with much more horsepower than I currently have at my disposal. Specifically, the processor will have at least 4 cores, and I want to be sure to get my algorithm to automatically detect and utilise all available cores. Also, that system will potentially be something other than Linux, so are there any common pratfalls that I have to watch for when moving the Multiprocessing module between OS's?

Oh yeah, also, the output of the script looks something like this:

hello world 0
parent process: 29362
process id: 29363 


hello world 1
parent process: 29362
process id: 29364 


hello world 2
parent process: 29362
process id: 29365 

and so on...

So from what I know so far, the PPIDs are all the same because the script above when run is the parent process which calls the children processes, which are each a different process. So does the multiprocessing automatically detect and handle multiple cores, or do I have to tell it where to look? Also, from what I read while searching for a copy of this question, I shouldn't be spawning more processes than there are cores because it eats up the system resources that would otherwise be used for computations.

Thanks in advance for your help, my thesis loves you.

Here's a handy little command I use to monitor my cores from the command line:

watch -d "mpstat -P ALL 1 1 | head -n 12"

Note that the mpstat command must be available on your system, which you can get on Ubuntu by installing the sysstat package.

sudo apt-get install sysstat

If you want to detect the number of available cores from Python, you can do so using the multiprocessing.cpu_count() function. On Intel CPUs with Hyper-Threading, this number will be double the actual number of cores. Launching as many processes as you have available cores will usually scale to fully occupy all cores on your machine, as long as the processes have enough work to do and don't get bogged down with communication. Linux's process scheduler will take it from there.

A few things about your code sample. You currently aren't using your lock, even though you create one. And, you are only joining on the last process you started. Right now they probably end so quickly that you won't see an issue, but if any of those earlier processes took longer than the last one, you might terminate before they are done I think.

Regarding making sure each process ends up on a different core. Unfortunately you can't. That is a decision that the scheduler of the operating system will make. You are simply writing code that uses multiple processes to allow the system to schedule them in parallel. Some may be on the same core.

Pitfalls (pratfalls?), might be that your actual code doesn't really require multiple processes and instead could benefit much better from threading. Also, you have to be very careful with how you share memory in multiprocessing. There is a lot more overhead involved with interprocess communication vs inter-thread. So its usually reserved for a case when threading simply will not get you what you need.

If you are on a unix system, you could try running the 'top' command and looking at how many of your processes are showing up concurrently. Although it is somewhat empirical, many times just looking at the process list will allow you to see multiples.

Although looking at your script, I don't see where you are calling multiple processes. You can import multiprocessing.pool at then map your function to different processors.
http://docs.python.org/library/multiprocessing.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM