简体   繁体   English

Python多处理:检索下一个结果

[英]Python multiprocessing: retrieve next result

I'm trying to figure out a good way to use the multiprocessing package in Python 3.6 to run a set of around 100 tasks, with a maximum of 4 of them running simultaneously. 我试图找出一种使用Python 3.6中的multiprocessing程序包来运行大约100个任务的好方法,其中最多4个同时运行。 I also want to: 我也想:

  1. repeatedly reap the next completed task from the pool and process its return value, until all tasks have either succeeded or failed; 重复从池中获取下一个已完成的任务并处理其返回值,直到所有任务成功或失败为止;
  2. make exceptions thrown in any given task non-fatal, so I can still access the results from the other tasks. 使在给定任务中抛出的异常不会致命,因此我仍然可以访问其他任务的结果。

I don't need to maintain the order of tasks submitted to the pool (ie I don't need a queue). 我不需要维护提交到池中的任务的顺序(即,我不需要队列)。 The total number of tasks ("100" above) isn't prohibitively huge, eg I don't mind submitting them all at once and letting them be queued until workers are available. 任务的总数(上面的“ 100”)不是过分庞大,例如,我不介意一次提交所有任务并将它们排入队列,直到有工作人员可用为止。

I thought that multiprocessing.Pool would be a good fit for this, but I can't seem to find a "get next result" method that I can call iteratively. 我以为multiprocessing.Pool可以很好地解决这个问题,但是我似乎找不到可以迭代调用的“获取下一个结果”方法。

Is this something I'm going to have to roll myself from process management primitives? 这是我将不得不从流程管理原语中扎根的东西吗? Or can Pool (or another thing I'm missing) support this workflow? 还是Pool (或我想念的另一件事)可以支持此工作流程?

For context, I'm using each worker to invoke a remote process that could take a few minutes, and that has capacity to handle N jobs simultaneously ("4" in my concretized example above). 对于上下文,我正在使用每个工作程序来调用一个远程过程,该过程可能需要几分钟,并且具有同时处理N个作业的能力(在上面的具体示例中为“ 4”)。

I came up with the following pattern (shown using 2 workers & 6 jobs, instead of 4 & 100): 我想出了以下模式(显示使用2个工人和6个工作,而不是4和100个工作):

import random
import time
from multiprocessing import Pool, TimeoutError
from queue import Queue


def worker(x):
    print("Start: {}".format(x))
    time.sleep(5 * random.random())  # Sleep a random amount of time
    if x == 2:
        raise Exception("Two is bad")
    return x


if __name__ == '__main__':

    with Pool(processes=2) as pool:
        jobs = Queue()
        for i in range(6):
            jobs.put(pool.apply_async(worker, [i]))

        while not jobs.empty():
            j = jobs.get(timeout=1)
            try:
                r = j.get(timeout=0.1)
                print("Done: {}".format(r))
            except TimeoutError as e:
                jobs.put(j)  # Not ready, try again later
            except Exception as e:
                print("Exception: {}".format(e))

Seems to work pretty well: 似乎工作得很好:

Start: 0
Start: 1
Start: 2
Done: 1
Start: 3
Exception: Two is bad
Start: 4
Start: 5
Done: 3
Done: 4
Done: 5
Done: 0

I'll see whether I can make a general utility to manage the queueing for me. 我将看看是否可以使用通用实用程序来为我管理队列。

The main shortcoming I think it has is that completed jobs can go unnoticed for a while, while uncompleted jobs are polled and possibly time out. 我认为它的主要缺点是,可能会在一段时间内忽略已完成的工作,而对未完成的工作进行轮询并可能会超时。 Avoiding that would probably require using callbacks - if it becomes a big enough problem, I'll probably add that to my app. 避免这种情况可能需要使用回调-如果它成为一个足够大的问题,我可能会将其添加到我的应用程序中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM