簡體   English   中英

如何在 python 中遇到異常時重新啟動池中的進程

[英]how to restart process in pool when hit exception in python

import signal
import asyncio
import os
import random
import time
import multiprocessing

my_list = []
for i in range(0,10):
    n = random.randint(1,100)
    my_list.append(n)


async def loop_item(my_item):
    while True:
        a = random.randint(1, 2)
        if a == 2:
            print(f"process id: {os.getpid()}")
            raise Exception('Error')
        print(f"process id: {os.getpid()} - {my_item}")
        time.sleep(0.5)


def run_loop(my_item):
    asyncio.run(loop_item(my_item))


def throw_error(e):
    os.system('bash /root/my-script.sh')  #that launchs "python my-script.py"
    os.killpg(os.getpgid(os.getpid()), signal.SIGKILL)


if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=10)
    for my_item in my_list:
        pool.apply_async(run_loop, (my_item,), error_callback=throw_error)
    pool.close()
    pool.join()

這是我的演示代碼,它將創建一個my_list ,其中包含 10 個項目作為隨機數

然后啟動 10 個進程以單獨使用 pid 將其打印出來

然后我添加一個raise Exception來模擬它可能發生的任何類型的異常,如果發生異常,我想在新進程中重新啟動這個loop_item(my_item) function

這有兩個障礙,一個是傳遞變量my_item但我認為我應該能夠使其與諸如 Redis 之類的放置/獲取變量的外部工具一起使用,但任何更好的想法都值得贊賞。

真正阻止我的是如何在遇到異常並退出后再次有效地啟動該進程

so far I was able to use throw_error function to kill the python script itself or launch another shell script to kill and launch python script again, but this approach seems to be less efficient

所以我想知道是否有更好的方法來重新啟動一個除外進程而不是重新啟動整個腳本?

我試過的一種方法是在throw_error function 中創建一個新的進程池,比如

def throw_error(e):
    pool2 = multiprocessing.Pool(processes=1)
    pool2.apply_async(run_loop, (my_item,), error_callback=throw_error)
    pool2.close()
    pool2.join()

但這似乎是一個壞主意,因為在多次異常之后,進程池正在失去控制,並且積累了數百甚至數千個“僵屍”進程

我假設這是XY Problem之一。

這個答案是建議一種可以解決問題 X 的替代設計,而不是解決問題 Y - 也就是在池中重新啟動進程。


據我所知,Python 無法很好地控制已生成的子進程或線程的正常終止。

因此,最好的方法是 - 只是不讓每個進程/線程完全失敗並首先傳播錯誤。

這可以通過編寫一個小包裝器來實現,在其中將 function 包裝在帶有Exceptiontry-except塊中 - 然后它將捕獲它遇到的任何異常。 然后我們可以使用一個while循環重試。

def wrapper(func, max_retries, data):
    """
    Wrapper adding retry capabilities to workload.

    Args:
        func: function to execute.
        max_retries: Maximum retries until moving on.
        data: data to process.

    Returns:
        (Input data, result) Tuple.
    """

    # wrap in while in case for retry.
    retry_count = 0
    while retry_count <= max_retries:  # while true: if you want infinite loop
        retry_count += 1

        # try to process data
        try:
            result = func(data)
        except Exception as err:
            # on exception save result as error and retry
            result = err
        else:
            # otherwise it was successful, break out of retry loop
            break

    # return result
    return data, result

這是一些測試這個想法的愚蠢的演示代碼,失敗的可能性只有一半。

import logging
import functools
import random
from os import getpid
from multiprocessing import Pool


logging.basicConfig(format="%(levelname)-8s %(message)s", level=logging.DEBUG)
logger = logging.getLogger()


def wrapper(func, max_retries, data):
    """
    Wrapper adding retry capabilities to workload with some fancy output.

    Args:
        func: function to execute.
        max_retries: Maximum retries until moving on.
        data: data to process.

    Returns:
        (Input data, result) Tuple.
    """
    pid = f"{getpid():<6}"
    logger.info(f"[{pid}]  Processing {data}")

    # just a line to satisfy pylint
    result = None

    # wrap in while in case for retry.
    retry_count = 0

    while retry_count <= max_retries:
        # try to process data
        try:
            result = func(data)
        except Exception as err:
            # on exception print out error, set result as err, then retry
            logger.error(
                f"[{pid}]  {err.__class__.__name__} while processing {data}, "
                f"{max_retries - retry_count} retries left. "
            )
            result = err
        else:
            break

        retry_count += 1

    # print and return result
    logger.info(f"[{pid}]  Processing {data} done")
    return data, result


class RogueAIException(Exception):
    pass


def workload(n):
    """
    Quite rebellious Fibonacci function
    """

    if random.randint(0, 1):
        raise RogueAIException("I'm sorry Dave, I'm Afraid I can't do that.")

    a, b = 0, 1
    for _ in range(n - 1):
        a, b = b, a + b

    return b


def main():
    data = [random.randint(0, 100) for _ in range(20)]

    # fix parameters. Decorator can't be pickled, we'll have to live with this.
    wrapped_workload = functools.partial(wrapper, workload, 3)

    with Pool(processes=3) as pool:
        # apply function for each data
        results = pool.map(wrapped_workload, data)

        print("\nInput Output")
        for fed_data, result in results:
            print(f"{fed_data:<6}{result}")


if __name__ == '__main__':
    main()

Output:

INFO     [13904 ]  Processing 40
ERROR    [13904 ]  RogueAIException while processing 40, 3 retries left. 
INFO     [13904 ]  Processing 40 done
INFO     [13904 ]  Processing 93
ERROR    [13904 ]  RogueAIException while processing 93, 3 retries left. 
INFO     [13904 ]  Processing 93 done
INFO     [13904 ]  Processing 96
INFO     [13904 ]  Processing 96 done
INFO     [13904 ]  Processing 48
INFO     [13904 ]  Processing 48 done
INFO     [13904 ]  Processing 17
INFO     [13904 ]  Processing 17 done
INFO     [13904 ]  Processing 52
ERROR    [13904 ]  RogueAIException while processing 52, 3 retries left. 
INFO     [13904 ]  Processing 52 done
INFO     [13904 ]  Processing 96
ERROR    [13904 ]  RogueAIException while processing 96, 3 retries left. 
INFO     [13904 ]  Processing 96 done
INFO     [13904 ]  Processing 23
ERROR    [13904 ]  RogueAIException while processing 23, 3 retries left. 
INFO     [13904 ]  Processing 23 done
INFO     [13904 ]  Processing 99
ERROR    [13904 ]  RogueAIException while processing 99, 3 retries left. 
ERROR    [13904 ]  RogueAIException while processing 99, 2 retries left.
INFO     [13904 ]  Processing 99 done
INFO     [13904 ]  Processing 55
ERROR    [13904 ]  RogueAIException while processing 55, 3 retries left.
ERROR    [13904 ]  RogueAIException while processing 55, 2 retries left.
INFO     [13904 ]  Processing 55 done
INFO     [13904 ]  Processing 63
ERROR    [13904 ]  RogueAIException while processing 63, 3 retries left.
INFO     [13904 ]  Processing 63 done
INFO     [13904 ]  Processing 61
INFO     [25180 ]  Processing 3
ERROR    [13904 ]  RogueAIException while processing 61, 3 retries left.
INFO     [25180 ]  Processing 3 done
INFO     [13904 ]  Processing 61 done
INFO     [25180 ]  Processing 42
INFO     [13904 ]  Processing 33
ERROR    [25180 ]  RogueAIException while processing 42, 3 retries left.
ERROR    [13904 ]  RogueAIException while processing 33, 3 retries left.
ERROR    [25180 ]  RogueAIException while processing 42, 2 retries left.
INFO     [13904 ]  Processing 33 done
ERROR    [25180 ]  RogueAIException while processing 42, 1 retries left.
INFO     [13904 ]  Processing 2
INFO     [25180 ]  Processing 42 done
INFO     [13904 ]  Processing 2 done
INFO     [25180 ]  Processing 35
INFO     [13904 ]  Processing 45
INFO     [25180 ]  Processing 35 done
INFO     [13904 ]  Processing 45 done
INFO     [25180 ]  Processing 2
INFO     [13904 ]  Processing 11
ERROR    [25180 ]  RogueAIException while processing 2, 3 retries left.
ERROR    [13904 ]  RogueAIException while processing 11, 3 retries left.
INFO     [25180 ]  Processing 2 done
ERROR    [13904 ]  RogueAIException while processing 11, 2 retries left.
ERROR    [13904 ]  RogueAIException while processing 11, 1 retries left.
ERROR    [13904 ]  RogueAIException while processing 11, 0 retries left.
INFO     [13904 ]  Processing 11 done

Input Output
40    102334155
93    12200160415121876738
96    51680708854858323072
48    4807526976
17    1597
52    32951280099
96    51680708854858323072
23    28657
99    218922995834555169026
55    139583862445
63    6557470319842
61    2504730781961
3     2
42    267914296
33    3524578
2     1
35    9227465
2     1
45    1134903170
11    I'm sorry Dave, I'm Afraid I can't do that.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM