Python 多处理 - 终止/重新启动工作进程

[英]Python Multiprocessing - terminate / restart worker process

I have a bunch of long running processes that I would like to split up into multiple processes.我有一堆长时间运行的进程,我想将它们分成多个进程。 That part I can do no problem.那部分我可以做没有问题。 The issue I run into is sometimes these processes go into a hung state.我遇到的问题有时是这些进程 go 变成了挂起的 state。 To address this issue I would like to be able to set a time threshold for each task that a process is working on.为了解决这个问题,我希望能够为进程正在处理的每个任务设置时间阈值。 When that time threshold is exceeded I would like to restart or terminate the task.当超过该时间阈值时,我想重新启动或终止任务。

Originally my code was very simple using a process pool, however with the pool I could not figure out how to retrieve the processes inside the pool, nevermind how to restart / terminate a process in the pool.最初我的代码使用进程池非常简单,但是使用池我无法弄清楚如何检索池内的进程,更不用说如何重新启动/终止池中的进程。

I have resorted to using a queue and process objects as is illustrated in this example ( https://pymotw.com/2/multiprocessing/communication.html#passing-messages-to-processes with some changes.我已使用队列和进程对象,如本示例所示( https://pymotw.com/2/multiprocessing/communication.html#passing-messages-to-processes进行了一些更改。

My attempts to figure this out are in the code below.我试图弄清楚这一点在下面的代码中。 In its current state the process does not actually get terminated.在其当前的 state 中,该进程实际上并未终止。 Further to that I cannot figure out how to get the process to move onto the next task after the current task is terminated.除此之外,我无法弄清楚在当前任务终止后如何让进程转移到下一个任务。 Any suggestions / help appreciated, perhaps I'm going about this the wrong way.任何建议/帮助表示赞赏,也许我会以错误的方式解决这个问题。


    import multiprocess
    import time

    class Consumer(multiprocess.Process):
        def __init__(self, task_queue, result_queue, startTimes, name=None):
            if name:
                self.name = name
            print 'created process: {0}'.format(self.name)
            self.task_queue = task_queue
            self.result_queue = result_queue
            self.startTimes = startTimes

        def stopProcess(self):
            elapseTime = time.time() - self.startTimes[self.name]
            print 'killing process {0} {1}'.format(self.name, elapseTime)
            # now want to get the process to start procesing another job

        def run(self):
            The process subclass calls this on a separate process.
            proc_name = self.name
            print proc_name
            while True:
                # pulling the next task off the queue and starting it
                # on the current process.
                task = self.task_queue.get()

                if task is None:
                    # Poison pill means shutdown
                    #print '%s: Exiting' % proc_name
                self.startTimes[proc_name] = time.time()
                answer = task()

    class Task(object):
        def __init__(self, a, b, startTimes):
            self.a = a
            self.b = b
            self.startTimes = startTimes
            self.taskName = 'taskName_{0}_{1}'.format(self.a, self.b)

        def __call__(self):
            import time
            import os

            print 'new job in process pid:', os.getpid(), self.taskName

            if self.a == 2:
                time.sleep(20000) # simulate a hung process
                time.sleep(3) # pretend to take some time to do the work
            return '%s * %s = %s' % (self.a, self.b, self.a * self.b)

        def __str__(self):
            return '%s * %s' % (self.a, self.b)

    if __name__ == '__main__':
        # Establish communication queues
        # tasks = this is the work queue and results is for results or completed work
        tasks = multiprocess.JoinableQueue()
        results = multiprocess.Queue()

        #parentPipe, childPipe = multiprocess.Pipe(duplex=True)
        mgr = multiprocess.Manager()
        startTimes = mgr.dict()

        # Start consumers
        numberOfProcesses = 4
        processObjs = []
        for processNumber in range(numberOfProcesses):
            processObj = Consumer(tasks, results, startTimes)

        for process in processObjs:

        # Enqueue jobs
        num_jobs = 30
        for i in range(num_jobs):
            tasks.put(Task(i, i + 1, startTimes))

        # Add a poison pill for each process object
        for i in range(numberOfProcesses):

        # process monitor loop, 
        killProcesses = {}
        executing = True
        while executing:
            allDead = True
            for process in processObjs:
                name = process.name
                #status = consumer.status.getStatusString()
                status = process.is_alive()
                pid = process.ident
                elapsedTime = 0
                if name in startTimes:
                    elapsedTime = time.time() - startTimes[name]
                if elapsedTime > 10:

                print "{0} - {1} - {2} - {3}".format(name, status, pid, elapsedTime)
                if  allDead and status:
                    allDead = False
            if allDead:
                executing = False

        # Wait for all of the tasks to finish

        # Start printing results
        while num_jobs:
            result = results.get()
            print 'Result:', result
            num_jobs -= 1

A way simpler solution would be to continue using a than reimplementing the Pool is to design a mechanism which timeout the function you are running. 与重新实现Pool相比,一种更简单的解决方案是继续使用a设计一种机制,该机制会使您正在运行的功能超时。 For instance: 例如:

from time import sleep
import signal

class TimeoutError(Exception):

def handler(signum, frame):
    raise TimeoutError()

def run_with_timeout(func, *args, timeout=10, **kwargs):
    signal.signal(signal.SIGALRM, handler)
        res = func(*args, **kwargs)
    except TimeoutError as exc:
        res = exc
    return res

def test():

if __name__ == "__main__":
    import multiprocessing as mp

    p = mp.Pool()
    print(p.apply_async(run_with_timeout, args=(test,),

The signal.alarm set a timeout and when this timeout, it run the handler, which stop the execution of your function. signal.alarm设置一个超时,当超时时,它将运行处理程序,该处理程序将停止执行功能。

EDIT: If you are using a windows system, it seems to be a bit more complicated as signal does not implement SIGALRM . 编辑:如果您使用的是Windows系统,由于signal未实现SIGALRM ,这似乎有点复杂。 Another solution is to use the C-level python API. 另一个解决方案是使用C级python API。 This code have been adapted from this SO answer with a bit of adaptation to work on 64bit system. 该代码已从该SO答案改编而来,可以在64位系统上工作。 I have only tested it on linux but it should work the same on windows. 我只在linux上进行过测试,但在Windows上应该可以正常使用。

import threading
import ctypes
from time import sleep

class TimeoutError(Exception):

def run_with_timeout(func, *args, timeout=10, **kwargs):
    interupt_tid = int(threading.get_ident())

    def interupt_thread():
        # Call the low level C python api using ctypes. tid must be converted 
        # to c_long to be valid.
        res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
            ctypes.c_long(interupt_tid), ctypes.py_object(TimeoutError))
        if res == 0:
            raise ValueError("invalid thread id")
        elif res != 1:
            # "if it returns a number greater than one, you're in trouble,
            # and you should call it again with exc=NULL to revert the effect"
                ctypes.c_long(interupt_tid), 0)
            raise SystemError("PyThreadState_SetAsyncExc failed")

    timer = threading.Timer(timeout, interupt_thread)
        res = func(*args, **kwargs)
    except TimeoutError as exc:
        res = exc
    return res

def test():

if __name__ == "__main__":
    import multiprocessing as mp

    p = mp.Pool()
    print(p.apply_async(run_with_timeout, args=(test,),
                        kwds={"timeout": 1}).get())
    print(p.apply_async(run_with_timeout, args=(test,),
                        kwds={"timeout": 5}).get())

I generally recommend against subclassing multiprocessing.Process as it leads to code hard to read. 我通常建议不要将multiprocessing.Process子类化,因为它会使代码难以阅读。

I'd rather encapsulate your logic in a function and run it in a separate process. 我宁愿将您的逻辑封装在一个函数中,并在一个单独的进程中运行它。 This keeps the code much cleaner and intuitive. 这样可以使代码更加简洁直观。

Nevertheless, rather than reinventing the wheel, I'd recommend you to use some library which already solves the issue for you such as Pebble or billiard . 尽管如此,我还是建议您使用一些已经为您解决问题的库,而不是重新发明轮子,例如Pebblebilliard

For example, the Pebble library allows to easily set timeouts to processes running independently or within a Pool . 例如, Pebble库允许轻松地为独立运行或在Pool运行的进程设置超时。

Running your function within a separate process with a timeout: 在具有超时的单独进程中运行函数:

from pebble import concurrent
from concurrent.futures import TimeoutError

def function(foo, bar=0):
    return foo + bar

future = function(1, bar=2)

    result = future.result()  # blocks until results are ready
except TimeoutError as error:
    print("Function took longer than %d seconds" % error.args[1])

Same example but with a process Pool. 相同的示例,但具有进程池。

with ProcessPool(max_workers=5, max_tasks=10) as pool:
   future = pool.schedule(function, args=[1], timeout=10)

       result = future.result()  # blocks until results are ready
    except TimeoutError as error:
        print("Function took longer than %d seconds" % error.args[1])

In both cases, the timing out process will be automatically terminated for you. 在这两种情况下,超时过程都会自动为您终止。

For long running processes and/or long iterators, spawned workers might hang after some time.对于长时间运行的进程和/或长时间的迭代器,派生的工作人员可能会在一段时间后挂起。 To prevent this, there are two built-in techniques:为了防止这种情况,有两种内置技术:

  • Restart workers after they have delivered maxtasksperchild tasks from the queue.在他们从队列中交付maxtasksperchild任务后重新启动工作人员。
  • Pass timeout to pool.imap.next() , catch the TimeoutError, and finish the rest of the work in another pool.timeout传递给pool.imap.next() ,捕获 TimeoutError,并在另一个池中完成工作的 rest。

The following wrapper implements both, as a generator.以下包装器将两者都实现为生成器。 This also works when replacing stdlib multiprocessing with multiprocess .这在用multiprocess替换 stdlib multiprocessing时也有效。

import multiprocessing as mp

def imap(
    """Multiprocessing imap, restarting workers after maxtasksperchild tasks to avoid zombies.

        >>> list(imap(str, range(5)))
        ['0', '1', '2', '3', '4']

        mp.TimeoutError: if the next result cannot be returned within timeout seconds.

        Ordered results as they come in.
    with context.Pool(
    ) as pool:
        it = pool.imap(func, iterable)
        while True:
                yield it.next(timeout)
            except StopIteration:

To catch the TimeoutError:要捕获 TimeoutError:

>>> import time
>>> iterable = list(range(10))
>>> results = []
>>> try:
...     for i, result in enumerate(imap(time.sleep, iterable, processes=2, timeout=2)):
...         results.append(result)
... except mp.TimeoutError:
...     print("Failed to process the following subset of iterable:", iterable[i:])
Failed to process the following subset of iterable: [2, 3, 4, 5, 6, 7, 8, 9]

