Python中带有工作池的异步多处理：超时后如何继续进行？

Question

I would like to run a number of jobs using a pool of processes and apply a given timeout after which a job should be killed and replaced by another working on the next task. 我想使用一个进程池来运行多个作业，并应用给定的超时时间，然后将其杀死并替换为另一个处理下一个任务的作业。

I have tried to use the multiprocessing module which offers a method to run of pool of workers asynchronously (eg using map_async ), but there I can only set a "global" timeout after which all processes would be killed. 我尝试使用multiprocessing模块，该模块提供了一种异步运行工作者池的方法（例如，使用map_async ），但是我只能设置“全局”超时，之后所有进程将被map_async 。

Is it possible to have an individual timeout after which only a single process that takes too long is killed and a new worker is added to the pool again instead ( processing the next task and skipping the one that timed out )? 是否有可能有一个单独的超时，在此之后， 只有一个花费太长时间的进程被杀死，然后又向池中添加了新工作线程（ 处理下一个任务并跳过超时的任务 ）？

Here's a simple example to illustrate my problem: 这是一个简单的例子来说明我的问题：

def Check(n):
  import time
  if n % 2 == 0: # select some (arbitrary) subset of processes
    print "%d timeout" % n
    while 1:
      # loop forever to simulate some process getting stuck
      pass
  print "%d done" % n
  return 0

from multiprocessing import Pool
pool = Pool(processes=4)
result = pool.map_async(Check, range(10))
print result.get(timeout=1)

After the timeout all workers are killed and the program exits. 超时后，所有工作人员均被杀死，程序退出。 I would like instead that it continues with the next subtask. 相反，我希望它继续下一个子任务。 Do I have to implement this behavior myself or are there existing solutions? 我必须自己实施此行为还是有现有的解决方案？

Update 更新资料

It is possible to kill the hanging workers and they are automatically replaced. 可能会杀死吊死的工人，他们将被自动更换。 So I came up with this code: 所以我想出了这段代码：

jobs = pool.map_async(Check, range(10))
while 1:
  try:
    print "Waiting for result"
    result = jobs.get(timeout=1)
    break # all clear
  except multiprocessing.TimeoutError: 
    # kill all processes
    for c in multiprocessing.active_children():
      c.terminate()
print result

The problem now is that the loop never exits; 现在的问题是循环永远不会退出。 even after all tasks have been processed, calling get yields a timeout exception. 即使在处理完所有任务之后，调用get也会产生超时异常。

Answer 1

The pebble Pool module has been built for solving these types of issue. 卵石池模块是为解决此类问题而构建的。 It supports timeout on given tasks allowing to detect them and easily recover. 它支持给定任务的超时，可以检测到它们并轻松恢复。

from pebble import ProcessPool
from concurrent.futures import TimeoutError

with ProcessPool() as pool:
    future = pool.schedule(function, args=[1,2], timeout=5)

try:
    result = future.result()
except TimeoutError:
    print "Function took longer than %d seconds" % error.args[1]

For your specific example: 对于您的特定示例：

from pebble import ProcessPool
from concurrent.futures import TimeoutError

results = []

with ProcessPool(max_workers=4) as pool:
    future = pool.map(Check, range(10), timeout=5)

    iterator = future.result()

    # iterate over all results, if a computation timed out
    # print it and continue to the next result
    while True:
        try:
            result = next(iterator)
            results.append(result)
        except StopIteration:
            break  
        except TimeoutError as error:
            print "function took longer than %d seconds" % error.args[1] 

print results

Answer 2

Currently the Python does not provide native means to the control execution time of each distinct task in the pool outside the worker itself. 目前，Python尚未提供本地方法来控制工作程序自身外部池中每个不同任务的控制执行时间。
So the easy way is to use wait_procs in the psutil module and implement the tasks as subprocesses. 因此，简单的方法是在psutil模块中使用wait_procs并将任务实现为子流程。
If nonstandard libraries are not desirable, then you have to implement own Pool on base of subprocess module having the working cycle in the main process, poll() - ing the execution of each worker and performing required actions. 如果不需要非标准库，则必须在子流程模块的基础上实现自己的Pool，该子流程模块在主流程中具有工作周期， poll() -执行每个工作程序并执行所需的操作。

As for the updated problem, the pool becomes corrupted if you directly terminate one of the workers (it is the bug in the interpreter implementation, because such behavior should not be allowed): the worker is recreated, but the task is lost and the pool becomes nonjoinable. 至于更新后的问题，如果直接终止其中一个工作程序，则该存储池会损坏（这是解释器实现中的错误，因为不应允许这种行为）：重新创建了工作程序，但任务丢失了，并且该池变得不可连接。 You have to terminate all the pool and then recreate it again for another tasks: 您必须终止所有池 ，然后再次为其他任务重新创建它：

from multiprocessing import Pool
while True:
    pool = Pool(processes=4)
    jobs = pool.map_async(Check, range(10))
    print "Waiting for result"
    try:
        result = jobs.get(timeout=1)
        break # all clear
    except multiprocessing.TimeoutError: 
        # kill all processes
        pool.terminate()
        pool.join()
print result

UPDATE 更新

Pebble is an excellent and handy library, which solves the issue. Pebble是一个出色且方便的库，可以解决此问题。 Pebble is designed for the asynchronous execution of Python functions, where is PyExPool is designed for the asynchronous execution of modules and external executables, though both can be used interchangeably. Pebble专为Python函数的异步执行而设计， PyExPool专为模块和外部可执行文件的异步执行而设计，尽管两者可以互换使用。

One more aspect is when 3dparty dependencies are not desirable, then PyExPool can be a good choice, which is a single-file lightweight implementation of Multi-process Execution Pool with per-Job and global timeouts, opportunity to group Jobs into Tasks and other features. 还有一个方面是，当不需要3dparty依赖性时， PyExPool可能是一个不错的选择，它是具有每个作业和全局超时的多进程执行池的单文件轻量级实现，有机会将Jobs分组为Tasks和其他功能。
PyExPool can be embedded into your sources and customized , having permissive Apache 2.0 license and production quality, being used in the core of one high-loaded scientific benchmarking framework. PyExPool可以嵌入到您的资源中并进行自定义 ，具有许可的Apache 2.0许可证和生产质量，可用于一个高负载的科学基准测试框架的核心。

Answer 3

Try the construction where each process is being joined with a timeout on a separate thread. 尝试在每个线程都带有超时的单独线程上进行连接的构造。 So the main program never gets stuck and as well the processes which if gets stuck, would be killed due to timeout. 因此，主程序永远不会卡住，如果卡住，进程也会因超时而被杀死。 This technique is a combination of threading and multiprocessing modules. 此技术是线程和多处理模块的组合。

Here is my way to maintain the minimum x number of threads in the memory. 这是我在内存中维持最少x个线程的方法。 Its an combination of threading and multiprocessing modules. 它是线程和多处理模块的组合。 It may be unusual to other techniques like respected fellow members have explained above BUT may be worth considerable. 对于其他技术，例如受尊敬的同事已经在上面解释的技术，这可能是不寻常的，但其价值可观。 For the sake of explanation, I am taking a scenario of crawling a minimum of 5 websites at a time. 为了便于说明，我假设一次爬网至少5个网站。

so here it is:- 所以这里是：-

#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading

# Crawler function
def crawler(domain):
    # define crawler technique here.
    output.write(scrapeddata + "\n")
    pass

Next is threadController function. 接下来是threadController函数。 This function will control the flow of threads to the main memory. 此功能将控制线程到主内存的流。 It will keep activating the threads to maintain the threadNum "minimum" limit ie. 它将继续激活线程以维持threadNum“最小”限制，即。 5. Also it won't exit until, all Active threads(acitveCount) are finished up. 5.同样，直到所有活动线程（acitveCount）完成后，它才会退出。

It will maintain a minimum of threadNum(5) startProcess function threads (these threads will eventually start the Processes from the processList while joining them with a time out of 60 seconds). 它将维持最少的threadNum（5）个startProcess函数线程（这些线程最终将在从进程列表中启动Processes的同时以60秒的时间加入它们）。 After staring threadController, there would be 2 threads which are not included in the above limit of 5 ie. 盯着threadController之后，将有2个线程不包含在上述5个限制之内。 the Main thread and the threadController thread itself. 主线程和threadController线程本身。 thats why threading.activeCount() != 2 has been used. 那就是为什么使用threading.activeCount（）！= 2的原因。

def threadController():
    print "Thread count before child thread starts is:-", threading.activeCount(), len(processList)
    # staring first thread. This will make the activeCount=3
    Thread(target = startProcess).start()
    # loop while thread List is not empty OR active threads have not finished up.
    while len(processList) != 0 or threading.activeCount() != 2:
        if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
            len(processList) != 0):                            # processList is not empty
                Thread(target = startProcess).start()         # This line would start startThreads function as a seperate thread **

startProcess function, as a separate thread, would start Processes from the processlist. startProcess函数作为一个单独的线程，将从流程列表中启动流程。 The purpose of this function (**started as a different thread) is that It would become a parent thread for Processes. 该函数（**从其他线程开始）的目的是它将成为Processes的父线程。 So when It will join them with a timeout of 60 seconds, this would stop the startProcess thread to move ahead but this won't stop threadController to perform. 因此，当它将以60秒的超时加入它们时，这将停止startProcess线程继续前进，但不会停止threadController执行。 So this way, threadController will work as required. 这样，threadController将按需工作。

def startProcess():
    pr = processList.pop(0)
    pr.start()
    pr.join(60.00) # joining the thread with time out of 60 seconds as a float.

if __name__ == '__main__':
    # a file holding a list of domains
    domains = open("Domains.txt", "r").read().split("\n")
    output = open("test.txt", "a")
    processList = [] # thread list
    threadNum = 5 # number of thread initiated processes to be run at one time

    # making process List
    for r in range(0, len(domains), 1):
        domain = domains[r].strip()
        p = Process(target = crawler, args = (domain,))
        processList.append(p) # making a list of performer threads.

    # starting the threadController as a seperate thread.
    mt = Thread(target = threadController)
    mt.start()
    mt.join() # won't let go next until threadController thread finishes.

    output.close()
    print "Done"

Besides maintaining a minimum number of threads in the memory, my aim was to also have something which could avoid stuck threads or processes in the memory. 除了在内存中保持最少数量的线程外，我的目标还在于避免内存或进程卡在内存中。 I did this using the time out function. 我使用超时功能做到了这一点。 My apologies for any typing mistake. 对于任何输入错误，我深表歉意。

I hope this construction would help anyone in this world. 我希望这种构造能对这个世界上的任何人有所帮助。

Regards, 问候，

Vikas Gautam 维卡斯·高塔姆（Vikas Gautam）

Python中带有工作池的异步多处理：超时后如何继续进行？

问题描述

Update 更新资料

3 个解决方案

解决方案1
5 2015-07-02 13:16:07

解决方案2
1 2015-05-05 13:22:45

UPDATE 更新

解决方案3
0 2015-09-06 16:46:05

Python中带有工作池的异步多处理：超时后如何继续进行？

问题描述

Update 更新资料

3 个解决方案

解决方案1 5 2015-07-02 13:16:07

解决方案2 1 2015-05-05 13:22:45

UPDATE 更新

解决方案3 0 2015-09-06 16:46:05

解决方案1
5 2015-07-02 13:16:07

解决方案2
1 2015-05-05 13:22:45

解决方案3
0 2015-09-06 16:46:05