我如何判断 Python 的多处理模块是否正在使用我的所有内核进行计算？

Question

I have some simple code from a tutorial like this:我有一些来自这样的教程的简单代码：

from multiprocessing import Process, Lock
import os

def f(i):
    print 'hello world', i
    print 'parent process:', os.getppid()
    print 'process id:', os.getpid(), "\n\n"

if __name__ == '__main__':
    lock = Lock()

    for num in range(10):
        p = Process(target=f, args=(num,))
        p.start()
    p.join()

How can I tell if this is utilising both of my cores?我如何判断这是否在利用我的两个核心？ Currently I'm running Ubuntu 11.04 w/ 3 GB RAM and Intel Core 2 Duo @ 2.2GHz.目前我正在运行 Ubuntu 11.04 w/ 3 GB RAM 和 Intel Core 2 Duo @ 2.2GHz。

The project I'm learning this for is going to be moved to a huge machine in somebody's office, with much more horsepower than I currently have at my disposal.我正在学习的这个项目将被转移到某人办公室的一台巨大机器上，它的马力比我目前可以使用的要大得多。 Specifically, the processor will have at least 4 cores, and I want to be sure to get my algorithm to automatically detect and utilise all available cores.具体来说，处理器将至少有 4 个核心，我希望确保我的算法能够自动检测和利用所有可用的核心。 Also, that system will potentially be something other than Linux, so are there any common pratfalls that I have to watch for when moving the Multiprocessing module between OS's?此外，该系统可能不是 Linux，那么在操作系统之间移动多处理模块时，是否有任何我必须注意的常见问题？

Oh yeah, also, the output of the script looks something like this:哦，是的，脚本的 output 看起来像这样：

hello world 0
parent process: 29362
process id: 29363 


hello world 1
parent process: 29362
process id: 29364 


hello world 2
parent process: 29362
process id: 29365 

and so on...

So from what I know so far, the PPIDs are all the same because the script above when run is the parent process which calls the children processes, which are each a different process.因此，据我所知，PPID 都是相同的，因为运行时上面的脚本是调用子进程的父进程，每个子进程都是不同的进程。 So does the multiprocessing automatically detect and handle multiple cores, or do I have to tell it where to look?那么多处理是自动检测和处理多个内核，还是我必须告诉它去哪里看？ Also, from what I read while searching for a copy of this question, I shouldn't be spawning more processes than there are cores because it eats up the system resources that would otherwise be used for computations.另外，根据我在搜索这个问题的副本时所读到的内容，我不应该产生比内核更多的进程，因为它会占用原本用于计算的系统资源。

Thanks in advance for your help, my thesis loves you.预先感谢您的帮助，我的论文爱你。

Answer 1

Here's a handy little command I use to monitor my cores from the command line:这是我用来从命令行监控我的核心的一个方便的小命令：

watch -d "mpstat -P ALL 1 1 | head -n 12"

Note that the mpstat command must be available on your system, which you can get on Ubuntu by installing the sysstat package.请注意， mpstat命令必须在您的系统上可用，您可以通过安装sysstat package 在 Ubuntu 上获得该命令。

sudo apt-get install sysstat

If you want to detect the number of available cores from Python, you can do so using the multiprocessing.cpu_count() function. On Intel CPUs with Hyper-Threading, this number will be double the actual number of cores.如果要从 Python 检测可用内核数，可以使用multiprocessing.cpu_count() function 来实现。在具有超线程的 Intel CPU 上，这个数字将是实际内核数的两倍。 Launching as many processes as you have available cores will usually scale to fully occupy all cores on your machine, as long as the processes have enough work to do and don't get bogged down with communication.只要进程有足够的工作要做并且不会因通信而陷入困境，启动与可用内核一样多的进程通常会扩展到完全占用机器上的所有内核。 Linux's process scheduler will take it from there. Linux 的进程调度程序将从那里获取它。

Answer 2

A few things about your code sample.关于您的代码示例的一些事情。 You currently aren't using your lock, even though you create one.你目前没有使用你的锁，即使你创建了一个。 And, you are only joining on the last process you started.而且，您只是加入了您开始的最后一个流程。 Right now they probably end so quickly that you won't see an issue, but if any of those earlier processes took longer than the last one, you might terminate before they are done I think.现在它们可能结束得如此之快以至于你看不到任何问题，但如果这些早期过程中的任何一个比最后一个过程花费的时间更长，我认为你可能会在它们完成之前终止。

Regarding making sure each process ends up on a different core.关于确保每个进程都在不同的核心上结束。 Unfortunately you can't.不幸的是你不能。 That is a decision that the scheduler of the operating system will make.这是操作系统的调度程序将做出的决定。 You are simply writing code that uses multiple processes to allow the system to schedule them in parallel.您只是在编写使用多个进程的代码，以允许系统并行调度它们。 Some may be on the same core.有些可能在同一个核心上。

Pitfalls (pratfalls?), might be that your actual code doesn't really require multiple processes and instead could benefit much better from threading.陷阱（pratfalls？），可能是您的实际代码并不真正需要多个进程，而是可以从线程中获益更多。 Also, you have to be very careful with how you share memory in multiprocessing.此外，您必须非常小心在多处理中共享 memory 的方式。 There is a lot more overhead involved with interprocess communication vs inter-thread.与线程间相比，进程间通信涉及更多的开销。 So its usually reserved for a case when threading simply will not get you what you need.因此，它通常保留用于线程根本无法满足您需求的情况。

Answer 3

If you are on a unix system, you could try running the 'top' command and looking at how many of your processes are showing up concurrently.如果您使用的是 unix 系统，您可以尝试运行“top”命令并查看同时显示的进程数。 Although it is somewhat empirical, many times just looking at the process list will allow you to see multiples.虽然这有点经验主义，但很多时候只看进程列表就能让你看到倍数。

Although looking at your script, I don't see where you are calling multiple processes.尽管查看了您的脚本，但我看不到您在哪里调用多个进程。 You can import multiprocessing.pool at then map your function to different processors.您可以将 map 和 function 处的 multiprocessing.pool 导入不同的处理器。
http://docs.python.org/library/multiprocessing.html http://docs.python.org/library/multiprocessing.html

我如何判断 Python 的多处理模块是否正在使用我的所有内核进行计算？

问题描述

3 个解决方案

解决方案1
3 2012-04-05 23:33:55

解决方案2
1 2012-04-05 23:29:06

解决方案3
0 2012-04-05 23:18:18

我如何判断 Python 的多处理模块是否正在使用我的所有内核进行计算？

问题描述

3 个解决方案

解决方案1 3 2012-04-05 23:33:55

解决方案2 1 2012-04-05 23:29:06

解决方案3 0 2012-04-05 23:18:18

解决方案1
3 2012-04-05 23:33:55

解决方案2
1 2012-04-05 23:29:06

解决方案3
0 2012-04-05 23:18:18