[英]is there a way to limit how much gets submitted to a Pool of workers?
I have a Pool of workers and use apply_async
to submit work to them. 我有一个工作人员池,并使用
apply_async
向他们提交工作。 I do not care for the result of the function applied to each item. 我不在乎应用于每个项目的功能的结果。 The pool seems to accept any number of
apply_async
calls, no matter how large the data or how quickly the workers can keep up with the work. 该池似乎可以接受任意数量的
apply_async
调用,无论数据有多大或工作人员可以跟上多少时间。
Is there a way to make apply_async
block as soon as a certain number of items are waiting to be processed? 是否有一种方法可以在等待一定数量的项目处理时立即使
apply_async
块? I am sure internally, the pool is using a Queue, so it would be trivial to just use a maximum size for the Queue? 我确定在内部,该池正在使用一个队列,因此仅对队列使用最大大小会很琐碎?
If this is not supported, would it make sense to submit a big report because this look like very basic functionality and rather trivial to add? 如果不支持此功能,那么提交大报告是否有意义,因为这看起来非常基本,添加起来却很琐碎?
It would be a shame if one had to essentially re-implement the whole logic of Pool just to make this work. 如果为了完成这项工作而不得不本质上重新实现Pool的整个逻辑,那将是可耻的。
Here is some very basic code: 这是一些非常基本的代码:
from multiprocessing import Pool
dowork(item):
# process the item (for side effects, no return value needed)
pass
pool = Pool(nprocesses)
for work in getmorework():
# this should block if we already have too many work waiting!
pool.apply_async(dowork, (work,))
pool.close()
pool.join()
So something like this? 像这样吗?
import multiprocessing
import time
worker_count = 4
mp = multiprocessing.Pool(processes=worker_count)
workers = [None] * worker_count
while True:
try:
for i in range(worker_count):
if workers[i] is None or workers[i].ready():
workers[i] = mp.apply_async(dowork, args=next(getmorework()))
except StopIteration:
break
time.sleep(1)
I dunno how fast you're expecting each worker to finish, the time.sleep
may or may not be necessary or might need to be a different time or whatever. 我不知道您期望每个工作人员完成的速度有多快,
time.sleep
可能是必需的,也可能不是,或者可能需要一个不同的时间。
an alternative might be to use Queue
's directly: 一种替代方法是直接使用
Queue
:
from multiprocessing import Process, JoinableQueue
from time import sleep
from random import random
def do_work(i):
print(f"worker {i}")
sleep(random())
print(f"done {i}")
def worker():
while True:
item = q.get()
if item is None:
break
do_work(item)
q.task_done()
def generator(n):
for i in range(n):
print(f"gen {i}")
yield i
# 1 = allow generator to get this far ahead
q = JoinableQueue(1)
# 2 = maximum amount of parallelism
procs = [Process(target=worker) for _ in range(2)]
# and get them running
for p in procs:
p.daemon = True
p.start()
# schedule 10 items for processing
for item in generator(10):
q.put(item)
# wait for jobs to finish executing
q.join()
# signal workers to finish up
for p in procs:
q.put(None)
# wait for workers to actually finish
for p in procs:
p.join()
mostly stolen from example Python's queue
module: 大部分是从示例Python的
queue
模块中偷来的:
https://docs.python.org/3/library/queue.html#queue.Queue.join https://docs.python.org/3/library/queue.html#queue.Queue.join
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.