[英]What happens when I multiprocessing.pool.apply_async more times than I have processors
I have the following setup: 我有以下设置:
results = [f(args) for _ in range(10**3)]
But, f(args)
takes a long time to compute. 但是,
f(args)
需要很长时间才能计算出来。 So I'd like to throw multiprocessing at it. 所以我想抛出多处理。 I would like to do so by doing:
我想这样做:
pool = mp.pool(mp.cpu_count() -1) # mp.cpu_count() -> 8
results = [pool.apply_async(f, args) for _ in range(10**3)]
Clearly, I don't have 1000 processors on my computer, so my concern: 显然,我的计算机上没有1000个处理器,所以我担心:
Does the above call result in 1000 processes simultaneously competing for CPU time or 7 processes running simultaneously, iteratively computing the next f(args)
when the previous call finishes? 以上调用是否导致1000个进程同时竞争CPU时间或7个进程同时运行,迭代计算前一个调用结束时的下一个
f(args)
?
I suppose I could do something like pool.async_map(f, (args for _ in range(10**3)))
to get the same results, but the purpose of this post is to understand the behavior of pool.apply_async
我想我可以做一些类似
pool.async_map(f, (args for _ in range(10**3)))
来获得相同的结果,但这篇文章的目的是了解pool.apply_async
的行为
You'll never have more processes running than there are workers in your pool (in your case mp.cpu_count() - 1
. If you call apply_async
and all the workers are busy, the task will be queued and executed as soon as a worker frees up. You can see this with a simple test program: 你的工作流程永远不会超过池中的工作者(在你的情况下是
mp.cpu_count() - 1
如果你调用apply_async
并且所有工作人员都很忙,那么任务将在工作人员排队并执行后立即执行释放。你可以通过一个简单的测试程序看到这个:
#!/usr/bin/python
import time
import multiprocessing as mp
def worker(chunk):
print('working')
time.sleep(10)
return
def main():
pool = mp.Pool(2) # Only two workers
for n in range(0, 8):
pool.apply_async(worker, (n,))
print("called it")
pool.close()
pool.join()
if __name__ == '__main__':
main()
The output is like this: 输出是这样的:
called it
called it
called it
called it
called it
called it
called it
called it
working
working
<delay>
working
working
<delay>
working
working
<delay>
working
working
The number of worker processes is wholly controlled by the argument to mp.pool()
. 工作进程的数量完全由
mp.pool()
的参数控制。 So if mp.cpu_count()
returns 8 on your box, 7 worker processes will be created. 因此,如果
mp.cpu_count()
在您的框中返回8,则将创建7个工作进程。
All pool
methods ( apply_async()
among them) then use no more than that many worker processes. 所有
pool
方法(其中apply_async()
)然后只使用那么多工作进程。 Under the covers, arguments are pickled in the main program and sent over an inter-process pipe to worker processes. 在封面下,参数在主程序中被pickle并通过进程间管道发送到工作进程。 This hidden machinery effectively creates a work queue, off of which the fixed number of worker processes pull descriptions of work to do (function name + arguments).
这个隐藏的机器有效地创建了一个工作队列,固定数量的工作进程从中拉出工作描述(函数名+参数)。
Other than that, it's all just magic ;-) 除此之外,它只是魔术;-)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.