Python多处理：最大值池工作进程数量？

Question

I am making use of Python's multiprocessor library and wondering what would be the maximum of worker processes I can call? 我正在使用Python的多处理器库，并想知道我可以调用的最大工作进程是什么？

Eg I have defined async.pool = Pool(100) which would allow me to have max 100 async processes running at the same time, but I have no clue what would be the real maximum value for this? 例如，我已经定义了async.pool = Pool(100) ，这将允许我同时运行最多100个异步进程，但我不知道这是什么真正的最大值？

Does anyone know how to find the max value for my Pool? 有谁知道如何找到我的池的最大值？ I'm guessing it depends on CPU or memory. 我猜这取决于CPU或内存。

Answer 1

This is not a complete answer, but the source can help guide us. 这不是一个完整的答案，但来源可以帮助指导我们。 When you pass maxtasksperchild to Pool it saves this value as self._maxtasksperchild and only uses it in the creation of a worker object: 将maxtasksperchild传递给Pool它将此值保存为self._maxtasksperchild并仅在创建worker对象时使用它：

def _repopulate_pool(self):
    """Bring the number of pool processes up to the specified number,
    for use after reaping workers which have exited.
    """
    for i in range(self._processes - len(self._pool)):
        w = self.Process(target=worker,
                         args=(self._inqueue, self._outqueue,
                               self._initializer,
                               self._initargs, self._maxtasksperchild)
                        )

        ...

This worker object uses maxtasksperchild like so: 这个worker对象使用maxtasksperchild如下所示：

assert maxtasks is None or (type(maxtasks) == int and maxtasks > 0)

which wouldn't change the physical limit, and 这不会改变物理极限，而且

while maxtasks is None or (maxtasks and completed < maxtasks):
    try:
        task = get()
    except (EOFError, IOError):
        debug('worker got EOFError or IOError -- exiting')
        break
    ...
    put((job, i, result))
    completed += 1

essentially saving the results from each task. 基本上保存每个任务的结果。 While you could run into memory issues by saving too many results, you can achieve the same error by making a list too large in the first place. 虽然通过保存太多结果可能会遇到内存问题，但您可以通过首先使列表过大来实现相同的错误。 In short, the source does not suggest a limit to the number of tasks possible as long as the results can fit in memory once released. 简而言之，只要结果在释放后适合内存，源代码就不会建议对可能的任务数量进行限制。

Does this answer the question? 这回答了这个问题吗？ Not entirely. 不是完全。 However, on Ubuntu 12.04 with Python 2.7.5 this code, while inadvisable seems to run just fine for any large max_task value. 但是，在Ubuntu 12.04上使用Python 2.7.5这段代码， 虽然不建议似乎对任何大的max_task值都运行良好。 Be warned that the output seems to take exponentially longer to run for large values: 请注意，输出似乎需要花费更长的时间来运行大值：

import multiprocessing, time
max_tasks = 10**3

def f(x): 
    print x**2
    time.sleep(5)
    return x**2

P = multiprocessing.Pool(max_tasks)
for x in xrange(max_tasks):
    P.apply_async(f,args=(x,))
P.close()
P.join()

Answer 2

You can use as many workers as you have memory for. 您可以使用尽可能多的工作人员。 That being said, if you set up a pool without any process flag, you'll get workers equal to the machine CPUs: 话虽这么说，如果你设置一个没有任何process标志的池，你将使工人等于机器CPU：

From Pool docs: 来自Pool docs：

processes is the number of worker processes to use. processes是要使用的工作进程数。 If processes is None then the number returned by os.cpu_count() is used. 如果processes为None，则使用os.cpu_count（）返回的数字。

If you're doing CPU intensive work, i wouldn't want more workers in the pool than your CPU count. 如果您正在进行CPU密集型工作，我不希望池中的工作人员多于CPU数量。 More workers would force the OS to context switch out your processes, which in turn lowers the system performance. 更多的工作人员会强制操作系统上下文切换您的进程，从而降低系统性能。 Even resorting to using hyperthreading cores can, depending on your work, choke the processor. 即使使用超线程核心，也可以根据您的工作来阻塞处理器。

On the other hand, if your task is like a webserver with many concurrent requests that individually are not maxing out your processor, go ahead and spawn as many workers as you've got memory and/or IO capacity for. 另一方面，如果您的任务就像是一个Web服务器，其中包含许多并发请求，而这些请求并不能最大限度地处理您的处理器，请继续生成尽可能多的工作人员，因为您拥有内存和/或IO容量。

maxtasksperchild is something different. maxtasksperchild是不同的东西。 This flag forces the pool to release all resources accumulated by a worker, once the worker has been used/reused a certain number of times. 一旦工人被使用/重复使用了一定次数，该标志就会强制池释放工人积累的所有资源。

If you imagine your workers read from a disk, and this work has some setup overhead, maxtasksperchild will clear that overhead once a worker has done this many tasks. 如果您想象您的工作人员从磁盘读取，并且此工作有一些设置开销， maxtasksperchild将在工作人员完成这么多任务后清除该开销。

Python多处理：最大值池工作进程数量？

问题描述

2 个解决方案

解决方案1
3 2014-02-25 14:56:11

解决方案2
2 2015-11-05 12:36:00

Python多处理：最大值池工作进程数量？

问题描述

2 个解决方案

解决方案1 3 2014-02-25 14:56:11

解决方案2 2 2015-11-05 12:36:00

解决方案1
3 2014-02-25 14:56:11

解决方案2
2 2015-11-05 12:36:00