当我多处理.pool.apply_async比我有处理器多次时会发生什么

Question

I have the following setup: 我有以下设置：

results = [f(args) for _ in range(10**3)]

But, f(args) takes a long time to compute. 但是， f(args)需要很长时间才能计算出来。 So I'd like to throw multiprocessing at it. 所以我想抛出多处理。 I would like to do so by doing: 我想这样做：

pool = mp.pool(mp.cpu_count() -1) # mp.cpu_count() -> 8
results = [pool.apply_async(f, args) for _ in range(10**3)]

Clearly, I don't have 1000 processors on my computer, so my concern: 显然，我的计算机上没有1000个处理器，所以我担心：
Does the above call result in 1000 processes simultaneously competing for CPU time or 7 processes running simultaneously, iteratively computing the next f(args) when the previous call finishes? 以上调用是否导致1000个进程同时竞争CPU时间或7个进程同时运行，迭代计算前一个调用结束时的下一个f(args) ？

I suppose I could do something like pool.async_map(f, (args for _ in range(10**3))) to get the same results, but the purpose of this post is to understand the behavior of pool.apply_async 我想我可以做一些类似pool.async_map(f, (args for _ in range(10**3)))来获得相同的结果，但这篇文章的目的是了解pool.apply_async的行为

Answer 1

You'll never have more processes running than there are workers in your pool (in your case mp.cpu_count() - 1 . If you call apply_async and all the workers are busy, the task will be queued and executed as soon as a worker frees up. You can see this with a simple test program: 你的工作流程永远不会超过池中的工作者（在你的情况下是mp.cpu_count() - 1如果你调用apply_async并且所有工作人员都很忙，那么任务将在工作人员排队并执行后立即执行释放。你可以通过一个简单的测试程序看到这个：

#!/usr/bin/python

import time
import multiprocessing as mp

def worker(chunk):
    print('working')
    time.sleep(10)
    return

def main():
    pool = mp.Pool(2)  # Only two workers
    for n in range(0, 8):
        pool.apply_async(worker, (n,))
        print("called it")
    pool.close()
    pool.join()

if __name__ == '__main__':
    main()

The output is like this: 输出是这样的：

called it
called it
called it
called it
called it
called it
called it
called it
working
working
<delay>
working
working
<delay>
working 
working
<delay>
working
working

Answer 2

The number of worker processes is wholly controlled by the argument to mp.pool() . 工作进程的数量完全由mp.pool()的参数控制。 So if mp.cpu_count() returns 8 on your box, 7 worker processes will be created. 因此，如果mp.cpu_count()在您的框中返回8，则将创建7个工作进程。

All pool methods ( apply_async() among them) then use no more than that many worker processes. 所有pool方法（其中apply_async() ）然后只使用那么多工作进程。 Under the covers, arguments are pickled in the main program and sent over an inter-process pipe to worker processes. 在封面下，参数在主程序中被pickle并通过进程间管道发送到工作进程。 This hidden machinery effectively creates a work queue, off of which the fixed number of worker processes pull descriptions of work to do (function name + arguments). 这个隐藏的机器有效地创建了一个工作队列，固定数量的工作进程从中拉出工作描述（函数名+参数）。

Other than that, it's all just magic ;-) 除此之外，它只是魔术;-)

当我多处理.pool.apply_async比我有处理器多次时会发生什么

问题描述

2 个解决方案

解决方案1
11 已采纳 2014-05-05 20:50:32

解决方案2
6 2014-05-05 20:50:24

当我多处理.pool.apply_async比我有处理器多次时会发生什么

问题描述

2 个解决方案

解决方案1 11 已采纳 2014-05-05 20:50:32

解决方案2 6 2014-05-05 20:50:24

解决方案1
11 已采纳 2014-05-05 20:50:32

解决方案2
6 2014-05-05 20:50:24