如何在 python 中遇到异常时重新启动池中的进程

[英]how to restart process in pool when hit exception in python

import signal
import asyncio
import os
import random
import time
import multiprocessing

my_list = []
for i in range(0,10):
    n = random.randint(1,100)

async def loop_item(my_item):
    while True:
        a = random.randint(1, 2)
        if a == 2:
            print(f"process id: {os.getpid()}")
            raise Exception('Error')
        print(f"process id: {os.getpid()} - {my_item}")

def run_loop(my_item):

def throw_error(e):
    os.system('bash /root/my-script.sh')  #that launchs "python my-script.py"
    os.killpg(os.getpgid(os.getpid()), signal.SIGKILL)

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=10)
    for my_item in my_list:
        pool.apply_async(run_loop, (my_item,), error_callback=throw_error)

这是我的演示代码,它将创建一个my_list ,其中包含 10 个项目作为随机数

然后启动 10 个进程以单独使用 pid 将其打印出来

然后我添加一个raise Exception来模拟它可能发生的任何类型的异常,如果发生异常,我想在新进程中重新启动这个loop_item(my_item) function

这有两个障碍,一个是传递变量my_item但我认为我应该能够使其与诸如 Redis 之类的放置/获取变量的外部工具一起使用,但任何更好的想法都值得赞赏。


so far I was able to use throw_error function to kill the python script itself or launch another shell script to kill and launch python script again, but this approach seems to be less efficient


我试过的一种方法是在throw_error function 中创建一个新的进程池,比如

def throw_error(e):
    pool2 = multiprocessing.Pool(processes=1)
    pool2.apply_async(run_loop, (my_item,), error_callback=throw_error)


我假设这是XY Problem之一。

这个答案是建议一种可以解决问题 X 的替代设计,而不是解决问题 Y - 也就是在池中重新启动进程。

据我所知,Python 无法很好地控制已生成的子进程或线程的正常终止。

因此,最好的方法是 - 只是不让每个进程/线程完全失败并首先传播错误。

这可以通过编写一个小包装器来实现,在其中将 function 包装在带有Exceptiontry-except块中 - 然后它将捕获它遇到的任何异常。 然后我们可以使用一个while循环重试。

def wrapper(func, max_retries, data):
    Wrapper adding retry capabilities to workload.

        func: function to execute.
        max_retries: Maximum retries until moving on.
        data: data to process.

        (Input data, result) Tuple.

    # wrap in while in case for retry.
    retry_count = 0
    while retry_count <= max_retries:  # while true: if you want infinite loop
        retry_count += 1

        # try to process data
            result = func(data)
        except Exception as err:
            # on exception save result as error and retry
            result = err
            # otherwise it was successful, break out of retry loop

    # return result
    return data, result


import logging
import functools
import random
from os import getpid
from multiprocessing import Pool

logging.basicConfig(format="%(levelname)-8s %(message)s", level=logging.DEBUG)
logger = logging.getLogger()

def wrapper(func, max_retries, data):
    Wrapper adding retry capabilities to workload with some fancy output.

        func: function to execute.
        max_retries: Maximum retries until moving on.
        data: data to process.

        (Input data, result) Tuple.
    pid = f"{getpid():<6}"
    logger.info(f"[{pid}]  Processing {data}")

    # just a line to satisfy pylint
    result = None

    # wrap in while in case for retry.
    retry_count = 0

    while retry_count <= max_retries:
        # try to process data
            result = func(data)
        except Exception as err:
            # on exception print out error, set result as err, then retry
                f"[{pid}]  {err.__class__.__name__} while processing {data}, "
                f"{max_retries - retry_count} retries left. "
            result = err

        retry_count += 1

    # print and return result
    logger.info(f"[{pid}]  Processing {data} done")
    return data, result

class RogueAIException(Exception):

def workload(n):
    Quite rebellious Fibonacci function

    if random.randint(0, 1):
        raise RogueAIException("I'm sorry Dave, I'm Afraid I can't do that.")

    a, b = 0, 1
    for _ in range(n - 1):
        a, b = b, a + b

    return b

def main():
    data = [random.randint(0, 100) for _ in range(20)]

    # fix parameters. Decorator can't be pickled, we'll have to live with this.
    wrapped_workload = functools.partial(wrapper, workload, 3)

    with Pool(processes=3) as pool:
        # apply function for each data
        results = pool.map(wrapped_workload, data)

        print("\nInput Output")
        for fed_data, result in results:

if __name__ == '__main__':


INFO     [13904 ]  Processing 40
ERROR    [13904 ]  RogueAIException while processing 40, 3 retries left. 
INFO     [13904 ]  Processing 40 done
INFO     [13904 ]  Processing 93
ERROR    [13904 ]  RogueAIException while processing 93, 3 retries left. 
INFO     [13904 ]  Processing 93 done
INFO     [13904 ]  Processing 96
INFO     [13904 ]  Processing 96 done
INFO     [13904 ]  Processing 48
INFO     [13904 ]  Processing 48 done
INFO     [13904 ]  Processing 17
INFO     [13904 ]  Processing 17 done
INFO     [13904 ]  Processing 52
ERROR    [13904 ]  RogueAIException while processing 52, 3 retries left. 
INFO     [13904 ]  Processing 52 done
INFO     [13904 ]  Processing 96
ERROR    [13904 ]  RogueAIException while processing 96, 3 retries left. 
INFO     [13904 ]  Processing 96 done
INFO     [13904 ]  Processing 23
ERROR    [13904 ]  RogueAIException while processing 23, 3 retries left. 
INFO     [13904 ]  Processing 23 done
INFO     [13904 ]  Processing 99
ERROR    [13904 ]  RogueAIException while processing 99, 3 retries left. 
ERROR    [13904 ]  RogueAIException while processing 99, 2 retries left.
INFO     [13904 ]  Processing 99 done
INFO     [13904 ]  Processing 55
ERROR    [13904 ]  RogueAIException while processing 55, 3 retries left.
ERROR    [13904 ]  RogueAIException while processing 55, 2 retries left.
INFO     [13904 ]  Processing 55 done
INFO     [13904 ]  Processing 63
ERROR    [13904 ]  RogueAIException while processing 63, 3 retries left.
INFO     [13904 ]  Processing 63 done
INFO     [13904 ]  Processing 61
INFO     [25180 ]  Processing 3
ERROR    [13904 ]  RogueAIException while processing 61, 3 retries left.
INFO     [25180 ]  Processing 3 done
INFO     [13904 ]  Processing 61 done
INFO     [25180 ]  Processing 42
INFO     [13904 ]  Processing 33
ERROR    [25180 ]  RogueAIException while processing 42, 3 retries left.
ERROR    [13904 ]  RogueAIException while processing 33, 3 retries left.
ERROR    [25180 ]  RogueAIException while processing 42, 2 retries left.
INFO     [13904 ]  Processing 33 done
ERROR    [25180 ]  RogueAIException while processing 42, 1 retries left.
INFO     [13904 ]  Processing 2
INFO     [25180 ]  Processing 42 done
INFO     [13904 ]  Processing 2 done
INFO     [25180 ]  Processing 35
INFO     [13904 ]  Processing 45
INFO     [25180 ]  Processing 35 done
INFO     [13904 ]  Processing 45 done
INFO     [25180 ]  Processing 2
INFO     [13904 ]  Processing 11
ERROR    [25180 ]  RogueAIException while processing 2, 3 retries left.
ERROR    [13904 ]  RogueAIException while processing 11, 3 retries left.
INFO     [25180 ]  Processing 2 done
ERROR    [13904 ]  RogueAIException while processing 11, 2 retries left.
ERROR    [13904 ]  RogueAIException while processing 11, 1 retries left.
ERROR    [13904 ]  RogueAIException while processing 11, 0 retries left.
INFO     [13904 ]  Processing 11 done

Input Output
40    102334155
93    12200160415121876738
96    51680708854858323072
48    4807526976
17    1597
52    32951280099
96    51680708854858323072
23    28657
99    218922995834555169026
55    139583862445
63    6557470319842
61    2504730781961
3     2
42    267914296
33    3524578
2     1
35    9227465
2     1
45    1134903170
11    I'm sorry Dave, I'm Afraid I can't do that.


