简体   繁体   中英

Python Multiprocessing - terminate / restart worker process

I have a bunch of long running processes that I would like to split up into multiple processes. That part I can do no problem. The issue I run into is sometimes these processes go into a hung state. To address this issue I would like to be able to set a time threshold for each task that a process is working on. When that time threshold is exceeded I would like to restart or terminate the task.

Originally my code was very simple using a process pool, however with the pool I could not figure out how to retrieve the processes inside the pool, nevermind how to restart / terminate a process in the pool.

I have resorted to using a queue and process objects as is illustrated in this example ( https://pymotw.com/2/multiprocessing/communication.html#passing-messages-to-processes with some changes.

My attempts to figure this out are in the code below. In its current state the process does not actually get terminated. Further to that I cannot figure out how to get the process to move onto the next task after the current task is terminated. Any suggestions / help appreciated, perhaps I'm going about this the wrong way.

Thanks

    import multiprocess
    import time

    class Consumer(multiprocess.Process):
        def __init__(self, task_queue, result_queue, startTimes, name=None):
            multiprocess.Process.__init__(self)
            if name:
                self.name = name
            print 'created process: {0}'.format(self.name)
            self.task_queue = task_queue
            self.result_queue = result_queue
            self.startTimes = startTimes

        def stopProcess(self):
            elapseTime = time.time() - self.startTimes[self.name]
            print 'killing process {0} {1}'.format(self.name, elapseTime)
            self.task_queue.cancel_join_thread()
            self.terminate()
            # now want to get the process to start procesing another job

        def run(self):
            '''
            The process subclass calls this on a separate process.
            '''    
            proc_name = self.name
            print proc_name
            while True:
                # pulling the next task off the queue and starting it
                # on the current process.
                task = self.task_queue.get()
                self.task_queue.cancel_join_thread()

                if task is None:
                    # Poison pill means shutdown
                    #print '%s: Exiting' % proc_name
                    self.task_queue.task_done()
                    break
                self.startTimes[proc_name] = time.time()
                answer = task()
                self.task_queue.task_done()
                self.result_queue.put(answer)
            return

    class Task(object):
        def __init__(self, a, b, startTimes):
            self.a = a
            self.b = b
            self.startTimes = startTimes
            self.taskName = 'taskName_{0}_{1}'.format(self.a, self.b)

        def __call__(self):
            import time
            import os

            print 'new job in process pid:', os.getpid(), self.taskName

            if self.a == 2:
                time.sleep(20000) # simulate a hung process
            else:
                time.sleep(3) # pretend to take some time to do the work
            return '%s * %s = %s' % (self.a, self.b, self.a * self.b)

        def __str__(self):
            return '%s * %s' % (self.a, self.b)

    if __name__ == '__main__':
        # Establish communication queues
        # tasks = this is the work queue and results is for results or completed work
        tasks = multiprocess.JoinableQueue()
        results = multiprocess.Queue()

        #parentPipe, childPipe = multiprocess.Pipe(duplex=True)
        mgr = multiprocess.Manager()
        startTimes = mgr.dict()

        # Start consumers
        numberOfProcesses = 4
        processObjs = []
        for processNumber in range(numberOfProcesses):
            processObj = Consumer(tasks, results, startTimes)
            processObjs.append(processObj)

        for process in processObjs:
            process.start()

        # Enqueue jobs
        num_jobs = 30
        for i in range(num_jobs):
            tasks.put(Task(i, i + 1, startTimes))

        # Add a poison pill for each process object
        for i in range(numberOfProcesses):
            tasks.put(None)

        # process monitor loop, 
        killProcesses = {}
        executing = True
        while executing:
            allDead = True
            for process in processObjs:
                name = process.name
                #status = consumer.status.getStatusString()
                status = process.is_alive()
                pid = process.ident
                elapsedTime = 0
                if name in startTimes:
                    elapsedTime = time.time() - startTimes[name]
                if elapsedTime > 10:
                    process.stopProcess()

                print "{0} - {1} - {2} - {3}".format(name, status, pid, elapsedTime)
                if  allDead and status:
                    allDead = False
            if allDead:
                executing = False
            time.sleep(3)

        # Wait for all of the tasks to finish
        #tasks.join()

        # Start printing results
        while num_jobs:
            result = results.get()
            print 'Result:', result
            num_jobs -= 1

A way simpler solution would be to continue using a than reimplementing the Pool is to design a mechanism which timeout the function you are running. For instance:

from time import sleep
import signal

class TimeoutError(Exception):
    pass    

def handler(signum, frame):
    raise TimeoutError()

def run_with_timeout(func, *args, timeout=10, **kwargs):
    signal.signal(signal.SIGALRM, handler)
    signal.alarm(timeout)
    try:
        res = func(*args, **kwargs)
    except TimeoutError as exc:
        print("Timeout")
        res = exc
    finally:
        signal.alarm(0)
    return res


def test():
    sleep(4)
    print("ok")

if __name__ == "__main__":
    import multiprocessing as mp

    p = mp.Pool()
    print(p.apply_async(run_with_timeout, args=(test,),
                        kwds={"timeout":1}).get())

The signal.alarm set a timeout and when this timeout, it run the handler, which stop the execution of your function.

EDIT: If you are using a windows system, it seems to be a bit more complicated as signal does not implement SIGALRM . Another solution is to use the C-level python API. This code have been adapted from this SO answer with a bit of adaptation to work on 64bit system. I have only tested it on linux but it should work the same on windows.

import threading
import ctypes
from time import sleep


class TimeoutError(Exception):
    pass


def run_with_timeout(func, *args, timeout=10, **kwargs):
    interupt_tid = int(threading.get_ident())

    def interupt_thread():
        # Call the low level C python api using ctypes. tid must be converted 
        # to c_long to be valid.
        res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
            ctypes.c_long(interupt_tid), ctypes.py_object(TimeoutError))
        if res == 0:
            print(threading.enumerate())
            print(interupt_tid)
            raise ValueError("invalid thread id")
        elif res != 1:
            # "if it returns a number greater than one, you're in trouble,
            # and you should call it again with exc=NULL to revert the effect"
            ctypes.pythonapi.PyThreadState_SetAsyncExc(
                ctypes.c_long(interupt_tid), 0)
            raise SystemError("PyThreadState_SetAsyncExc failed")

    timer = threading.Timer(timeout, interupt_thread)
    try:
        timer.start()
        res = func(*args, **kwargs)
    except TimeoutError as exc:
        print("Timeout")
        res = exc
    else:
        timer.cancel()
    return res


def test():
    sleep(4)
    print("ok")


if __name__ == "__main__":
    import multiprocessing as mp

    p = mp.Pool()
    print(p.apply_async(run_with_timeout, args=(test,),
                        kwds={"timeout": 1}).get())
    print(p.apply_async(run_with_timeout, args=(test,),
                        kwds={"timeout": 5}).get())

I generally recommend against subclassing multiprocessing.Process as it leads to code hard to read.

I'd rather encapsulate your logic in a function and run it in a separate process. This keeps the code much cleaner and intuitive.

Nevertheless, rather than reinventing the wheel, I'd recommend you to use some library which already solves the issue for you such as Pebble or billiard .

For example, the Pebble library allows to easily set timeouts to processes running independently or within a Pool .

Running your function within a separate process with a timeout:

from pebble import concurrent
from concurrent.futures import TimeoutError

@concurrent.process(timeout=10)
def function(foo, bar=0):
    return foo + bar

future = function(1, bar=2)

try:
    result = future.result()  # blocks until results are ready
except TimeoutError as error:
    print("Function took longer than %d seconds" % error.args[1])

Same example but with a process Pool.

with ProcessPool(max_workers=5, max_tasks=10) as pool:
   future = pool.schedule(function, args=[1], timeout=10)

   try:
       result = future.result()  # blocks until results are ready
    except TimeoutError as error:
        print("Function took longer than %d seconds" % error.args[1])

In both cases, the timing out process will be automatically terminated for you.

For long running processes and/or long iterators, spawned workers might hang after some time. To prevent this, there are two built-in techniques:

  • Restart workers after they have delivered maxtasksperchild tasks from the queue.
  • Pass timeout to pool.imap.next() , catch the TimeoutError, and finish the rest of the work in another pool.

The following wrapper implements both, as a generator. This also works when replacing stdlib multiprocessing with multiprocess .

import multiprocessing as mp


def imap(
    func,
    iterable,
    *,
    processes=None,
    maxtasksperchild=42,
    timeout=42,
    initializer=None,
    initargs=(),
    context=mp.get_context("spawn")
):
    """Multiprocessing imap, restarting workers after maxtasksperchild tasks to avoid zombies.

    Example:
        >>> list(imap(str, range(5)))
        ['0', '1', '2', '3', '4']

    Raises:
        mp.TimeoutError: if the next result cannot be returned within timeout seconds.

    Yields:
        Ordered results as they come in.
    """
    with context.Pool(
        processes=processes,
        maxtasksperchild=maxtasksperchild,
        initializer=initializer,
        initargs=initargs,
    ) as pool:
        it = pool.imap(func, iterable)
        while True:
            try:
                yield it.next(timeout)
            except StopIteration:
                return

To catch the TimeoutError:

>>> import time
>>> iterable = list(range(10))
>>> results = []
>>> try:
...     for i, result in enumerate(imap(time.sleep, iterable, processes=2, timeout=2)):
...         results.append(result)
... except mp.TimeoutError:
...     print("Failed to process the following subset of iterable:", iterable[i:])
Failed to process the following subset of iterable: [2, 3, 4, 5, 6, 7, 8, 9]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM