简体   繁体   English

通过多处理启动大量异步进程

[英]Starting a large number of async processes with multiprocessing

If I call apply_async 10,000 times, assuming the OOM-killer doesn't interfere, will multiprocessing start them all simultaneously, or will it start them in batches. 如果我打电话给apply_async 10,000次,假设OOM杀手不干预,那么多处理将同时启动它们,或者分批启动它们。 For example.. Every 100 starts, waiting for 90 to finish starting before starting any more? 例如,。每100个启动,在等待90个启动完成之后再启动吗?

Dustin 达斯汀

apply_async() is a method of multiprocessing.Pool objects, and delivers all work to the number of processes you specified when you created the Pool . apply_async()multiprocessing.Pool对象的一种方法,它将所有工作交付给您创建Pool时指定的进程数。 Only that many tasks can run simultaneously. 只有那么多任务才能同时运行。 The rest are saved in queues (or pipes) by the multiprocessing machinery, and automatically doled out to processes as they complete tasks already assigned. 其余的数据由多处理机器保存在队列(或管道)中,并在完成已分配的任务时自动分发给流程。 Much the same is true of all the Pool methods to which you feed multiple work items. 向您提供多个工作项的所有 Pool方法几乎都是如此。

A little more clarification: apply_async doesn't create, or start, any processes. 进一步说明: apply_async不创建或启动任何进程。 The processes were created when you called Pool() . 这些进程是在调用Pool()时创建的。 The processes just sit there and wait until you invoke Pool methods (like apply_async() ) that ask for some real work to be done. 进程就坐在那儿,等到您调用要求完成一些实际工作的Pool方法(如apply_async() )。

Example

Play with this: 玩这个:

MAX = 100000

from time import sleep
def f(i):
    sleep(0.01)
    return i

def summer(summand):
    global SUM, FINISHED
    SUM += summand
    FINISHED += 1

if __name__ == "__main__":
    import multiprocessing as mp
    SUM = 0
    FINISHED = 0
    pool = mp.Pool(4)

    print "queuing", MAX, "work descriptions"
    for i in xrange(MAX):
        pool.apply_async(f, args=(i,), callback=summer)
        if i % 1000 == 0:
            print "{}/{}".format(FINISHED, i),
    print

    print "closing pool"
    pool.close()

    print "waiting for processes to end"
    pool.join()

    print "verifying result"
    print "got", SUM, "expected", sum(xrange(MAX))

Output is like: 输出如下:

queuing 100000 work descriptions
0/0 12/1000 21/2000 33/3000 42/4000
... stuff chopped for brevity ...
1433/95000 1445/96000 1456/97000 1466/98000 1478/99000
closing pool
waiting for processes to end
... and it waits here "for a long time" ...
verifying result
got 4999950000 expected 4999950000

You can answer most of your questions just by observing its behavior. 您只需观察其行为即可回答大多数问题。 The work items are queued up quickly. 工作项迅速排队。 By the time we see "closing pool", all the work items have been queued, but 1478 have already completed, and about 98000 are still waiting for some process to work on them. 到我们看到“关闭池”时,所有工作项都已排队,但1478个工作已完成,大约98000个仍在等待某些工作来处理它们。

If you take the sleep(0.01) out of f() , it's much less revealing, because results come back almost as fast as work items are queued. 如果从f() sleep(0.01)删除sleep(0.01) ,那么它的显示性就差得多,因为结果返回的速度几乎与工作项排队的速度一样快。

Memory use remains trivial no matter how you run it, though. 但是,无论如何运行,内存使用都是微不足道的。 The work items here (the name of the function ( "f" ) and its pickled integer argument) are tiny. 这里的工作项(函数的名称( "f" )及其腌制的整数参数)很小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM