简体   繁体   English

Python的多处理池产生新进程

[英]Python's multiprocessing.Pool spawns new processes

I have a simple question about the multiprocessing module. 我对多处理模块有一个简单的问题。 I am using multiprocessing.Pool's map() function to speed up execution of self-written code on my local machine. 我正在使用multiprocessing.Pool的map()函数来加快本地计算机上自写代码的执行速度。 However, this code is run in an iterative loop and I find additional Python processes spawned in my machine with every iteration of the loop. 但是,此代码在迭代循环中运行,每次循环迭代时,我都会在计算机中生成其他Python进程。 (This is a problem because the system slowly grinds to a halt). (这是一个问题,因为系统缓慢地停止运行)。 Here's a simple example: 这是一个简单的例子:

from multiprocessing import Pool
import os

nthreads = 2
for ii in xrange(5):
    pool = Pool(processes=nthreads)  # (in my code, Pool is inside a pickleable function.)
    runningProcesses = os.popen('ps | grep ython').readlines()
    nproc = len(runningProcesses)
    print "After iteration %i there were %i Python processes running!" % (ii, nproc)

The output is: 输出为:

After iteration 0 there were 5 Python processes running!
After iteration 1 there were 7 Python processes running!
After iteration 2 there were 9 Python processes running!
After iteration 3 there were 11 Python processes running!
After iteration 4 there were 13 Python processes running!

How should I arrange my code to avoid spawning many new Python processes? 我应该如何安排我的代码以避免产生许多新的Python进程? I am running Python 2.7.6, which has multiprocessing v0.70a1, and am on a 4-core MacBook Pro running OSX 10.8.5. 我正在运行具有多处理v0.70a1的Python 2.7.6,并且在运行OSX 10.8.5的4核MacBook Pro上。

pool = Pool(processes=nthreads)放在for循环上方

As discussed in the comments - the worker processes in the pool are not being closed/joined, so they never terminate. 正如评论中所讨论的那样,池中的工作进程并未关闭/加入,因此它们永远不会终止。 The top answer here shows how to clean up the pool when you no longer need it: Python multiprocessing pool, join; 此处的最高答案显示了在不再需要池时如何清理它: Python多处理池,连接; not waiting to go on? 等不及要继续?

As a side note, if you are creating large numbers of workers and using them to perform very short/quick jobs, then you may find that the performance suffers - there is an overhead for the OS to create and destroy processes. 附带说明一下,如果要创建大量的工作程序并使用它们执行非常短/快速的工作,则可能会发现性能下降-操作系统创建和销毁进程会产生开销。 If that is the case, then you should look at using a single Pool throughout your application. 如果是这种情况,那么您应该考虑在整个应用程序中使用一个池。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM