[英]Python multiprocessing: retrieve next result
I'm trying to figure out a good way to use the multiprocessing
package in Python 3.6 to run a set of around 100 tasks, with a maximum of 4 of them running simultaneously. 我试图找出一种使用Python 3.6中的multiprocessing
程序包来运行大约100个任务的好方法,其中最多4个同时运行。 I also want to: 我也想:
I don't need to maintain the order of tasks submitted to the pool (ie I don't need a queue). 我不需要维护提交到池中的任务的顺序(即,我不需要队列)。 The total number of tasks ("100" above) isn't prohibitively huge, eg I don't mind submitting them all at once and letting them be queued until workers are available. 任务的总数(上面的“ 100”)不是过分庞大,例如,我不介意一次提交所有任务并将它们排入队列,直到有工作人员可用为止。
I thought that multiprocessing.Pool
would be a good fit for this, but I can't seem to find a "get next result" method that I can call iteratively. 我以为multiprocessing.Pool
可以很好地解决这个问题,但是我似乎找不到可以迭代调用的“获取下一个结果”方法。
Is this something I'm going to have to roll myself from process management primitives? 这是我将不得不从流程管理原语中扎根的东西吗? Or can Pool
(or another thing I'm missing) support this workflow? 还是Pool
(或我想念的另一件事)可以支持此工作流程?
For context, I'm using each worker to invoke a remote process that could take a few minutes, and that has capacity to handle N jobs simultaneously ("4" in my concretized example above). 对于上下文,我正在使用每个工作程序来调用一个远程过程,该过程可能需要几分钟,并且具有同时处理N个作业的能力(在上面的具体示例中为“ 4”)。
I came up with the following pattern (shown using 2 workers & 6 jobs, instead of 4 & 100): 我想出了以下模式(显示使用2个工人和6个工作,而不是4和100个工作):
import random
import time
from multiprocessing import Pool, TimeoutError
from queue import Queue
def worker(x):
print("Start: {}".format(x))
time.sleep(5 * random.random()) # Sleep a random amount of time
if x == 2:
raise Exception("Two is bad")
return x
if __name__ == '__main__':
with Pool(processes=2) as pool:
jobs = Queue()
for i in range(6):
jobs.put(pool.apply_async(worker, [i]))
while not jobs.empty():
j = jobs.get(timeout=1)
try:
r = j.get(timeout=0.1)
print("Done: {}".format(r))
except TimeoutError as e:
jobs.put(j) # Not ready, try again later
except Exception as e:
print("Exception: {}".format(e))
Seems to work pretty well: 似乎工作得很好:
Start: 0
Start: 1
Start: 2
Done: 1
Start: 3
Exception: Two is bad
Start: 4
Start: 5
Done: 3
Done: 4
Done: 5
Done: 0
I'll see whether I can make a general utility to manage the queueing for me. 我将看看是否可以使用通用实用程序来为我管理队列。
The main shortcoming I think it has is that completed jobs can go unnoticed for a while, while uncompleted jobs are polled and possibly time out. 我认为它的主要缺点是,可能会在一段时间内忽略已完成的工作,而对未完成的工作进行轮询并可能会超时。 Avoiding that would probably require using callbacks - if it becomes a big enough problem, I'll probably add that to my app. 避免这种情况可能需要使用回调-如果它成为一个足够大的问题,我可能会将其添加到我的应用程序中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.