[英]Python multiprocessing: how to limit the number of waiting processes?
When running a large number of tasks (with large parameters) using Pool.apply_async, the processes are allocated and go to a waiting state, and there is no limit for the number of waiting processes.使用Pool.apply_async运行大量任务(大参数)时,进程被分配并进入等待状态,等待进程数没有限制。 This can end up by eating all memory, as in the example below:
这最终可能会耗尽所有内存,如下例所示:
import multiprocessing
import numpy as np
def f(a,b):
return np.linalg.solve(a,b)
def test():
p = multiprocessing.Pool()
for _ in range(1000):
p.apply_async(f, (np.random.rand(1000,1000),np.random.rand(1000)))
p.close()
p.join()
if __name__ == '__main__':
test()
I'm searching for a way to limit the waiting queue, in such a way that there is only a limited number of waiting processes, and Pool.apply_async is blocked while the waiting queue is full.我正在寻找一种限制等待队列的方法,这种方式只有有限数量的等待进程,并且 Pool.apply_async 在等待队列已满时被阻塞。
multiprocessing.Pool
has a _taskqueue
member of type multiprocessing.Queue
, which takes an optional maxsize
parameter; multiprocessing.Pool
具有_taskqueue
类型的构件multiprocessing.Queue
,这需要一个可选maxsize
参数; unfortunately it constructs it without the maxsize
parameter set.不幸的是,它在没有
maxsize
参数集的情况下构建它。
I'd recommend subclassing multiprocessing.Pool
with a copy-paste of multiprocessing.Pool.__init__
that passes maxsize
to _taskqueue
constructor.我建议你继承
multiprocessing.Pool
用的复制粘贴multiprocessing.Pool.__init__
是传球maxsize
到_taskqueue
构造。
Monkey-patching the object (either the pool or the queue) would also work, but you'd have to monkeypatch pool._taskqueue._maxsize
and pool._taskqueue._sem
so it would be quite brittle:猴子修补对象(池或队列)也可以,但您必须对
pool._taskqueue._maxsize
和pool._taskqueue._sem
进行猴子pool._taskqueue._maxsize
,因此它会非常脆弱:
pool._taskqueue._maxsize = maxsize
pool._taskqueue._sem = BoundedSemaphore(maxsize)
Wait if pool._taskqueue
is over the desired size:如果
pool._taskqueue
超过所需大小,请等待:
import multiprocessing
import time
import numpy as np
def f(a,b):
return np.linalg.solve(a,b)
def test(max_apply_size=100):
p = multiprocessing.Pool()
for _ in range(1000):
p.apply_async(f, (np.random.rand(1000,1000),np.random.rand(1000)))
while p._taskqueue.qsize() > max_apply_size:
time.sleep(1)
p.close()
p.join()
if __name__ == '__main__':
test()
Here is a monkey patching alternative to the top answer:这是最佳答案的猴子修补替代方案:
import queue
from multiprocessing.pool import ThreadPool as Pool
class PatchedQueue():
"""
Wrap stdlib queue and return a Queue(maxsize=...)
when queue.SimpleQueue is accessed
"""
def __init__(self, simple_queue_max_size=5000):
self.simple_max = simple_queue_max_size
def __getattr__(self, attr):
if attr == "SimpleQueue":
return lambda: queue.Queue(maxsize=self.simple_max)
return getattr(queue, attr)
class BoundedPool(Pool):
# Override queue in this scope to use the patcher above
queue = PatchedQueue()
pool = BoundedPool()
pool.apply_async(print, ("something",))
This is working as expected for Python 3.8 where multiprocessing Pool is using queue.SimpleQueue
to setup the task queue.这在 Python 3.8 中按预期工作,其中多处理池使用
queue.SimpleQueue
来设置任务队列。 It sounds like the implementation for multiprocessing.Pool
may have changed since 2.7听起来
multiprocessing.Pool
的实现可能自 2.7 以来发生了变化
You could add explicit Queue with maxsize parameter and use queue.put()
instead of pool.apply_async()
in this case.在这种情况下,您可以使用 maxsize 参数添加显式 Queue 并使用
queue.put()
而不是pool.apply_async()
。 Then worker processes could:然后工作进程可以:
for a, b in iter(queue.get, sentinel):
# process it
If you want to limit the number of created input arguments/results that are in memory to approximately the number of active worker processes then you could use pool.imap*()
methods:如果要将内存中创建的输入参数/结果的数量限制为大约活动工作进程的数量,则可以使用
pool.imap*()
方法:
#!/usr/bin/env python
import multiprocessing
import numpy as np
def f(a_b):
return np.linalg.solve(*a_b)
def main():
args = ((np.random.rand(1000,1000), np.random.rand(1000))
for _ in range(1000))
p = multiprocessing.Pool()
for result in p.imap_unordered(f, args, chunksize=1):
pass
p.close()
p.join()
if __name__ == '__main__':
main()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.