[英]Python Pool - How do I keep Python Process running when I get a timeout exception?
[英]how to restart process in pool when hit exception in python
import signal
import asyncio
import os
import random
import time
import multiprocessing
my_list = []
for i in range(0,10):
n = random.randint(1,100)
my_list.append(n)
async def loop_item(my_item):
while True:
a = random.randint(1, 2)
if a == 2:
print(f"process id: {os.getpid()}")
raise Exception('Error')
print(f"process id: {os.getpid()} - {my_item}")
time.sleep(0.5)
def run_loop(my_item):
asyncio.run(loop_item(my_item))
def throw_error(e):
os.system('bash /root/my-script.sh') #that launchs "python my-script.py"
os.killpg(os.getpgid(os.getpid()), signal.SIGKILL)
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=10)
for my_item in my_list:
pool.apply_async(run_loop, (my_item,), error_callback=throw_error)
pool.close()
pool.join()
这是我的演示代码,它将创建一个my_list
,其中包含 10 个项目作为随机数
然后启动 10 个进程以单独使用 pid 将其打印出来
然后我添加一个raise Exception
来模拟它可能发生的任何类型的异常,如果发生异常,我想在新进程中重新启动这个loop_item(my_item)
function
这有两个障碍,一个是传递变量my_item
但我认为我应该能够使其与诸如 Redis 之类的放置/获取变量的外部工具一起使用,但任何更好的想法都值得赞赏。
真正阻止我的是如何在遇到异常并退出后再次有效地启动该进程
so far I was able to use throw_error
function to kill the python script itself or launch another shell script to kill and launch python script again, but this approach seems to be less efficient
所以我想知道是否有更好的方法来重新启动一个除外进程而不是重新启动整个脚本?
我试过的一种方法是在throw_error
function 中创建一个新的进程池,比如
def throw_error(e):
pool2 = multiprocessing.Pool(processes=1)
pool2.apply_async(run_loop, (my_item,), error_callback=throw_error)
pool2.close()
pool2.join()
但这似乎是一个坏主意,因为在多次异常之后,进程池正在失去控制,并且积累了数百甚至数千个“僵尸”进程
我假设这是XY Problem之一。
这个答案是建议一种可以解决问题 X 的替代设计,而不是解决问题 Y - 也就是在池中重新启动进程。
据我所知,Python 无法很好地控制已生成的子进程或线程的正常终止。
因此,最好的方法是 - 只是不让每个进程/线程完全失败并首先传播错误。
这可以通过编写一个小包装器来实现,在其中将 function 包装在带有Exception
的try-except
块中 - 然后它将捕获它遇到的任何异常。 然后我们可以使用一个while
循环重试。
def wrapper(func, max_retries, data):
"""
Wrapper adding retry capabilities to workload.
Args:
func: function to execute.
max_retries: Maximum retries until moving on.
data: data to process.
Returns:
(Input data, result) Tuple.
"""
# wrap in while in case for retry.
retry_count = 0
while retry_count <= max_retries: # while true: if you want infinite loop
retry_count += 1
# try to process data
try:
result = func(data)
except Exception as err:
# on exception save result as error and retry
result = err
else:
# otherwise it was successful, break out of retry loop
break
# return result
return data, result
这是一些测试这个想法的愚蠢的演示代码,失败的可能性只有一半。
import logging
import functools
import random
from os import getpid
from multiprocessing import Pool
logging.basicConfig(format="%(levelname)-8s %(message)s", level=logging.DEBUG)
logger = logging.getLogger()
def wrapper(func, max_retries, data):
"""
Wrapper adding retry capabilities to workload with some fancy output.
Args:
func: function to execute.
max_retries: Maximum retries until moving on.
data: data to process.
Returns:
(Input data, result) Tuple.
"""
pid = f"{getpid():<6}"
logger.info(f"[{pid}] Processing {data}")
# just a line to satisfy pylint
result = None
# wrap in while in case for retry.
retry_count = 0
while retry_count <= max_retries:
# try to process data
try:
result = func(data)
except Exception as err:
# on exception print out error, set result as err, then retry
logger.error(
f"[{pid}] {err.__class__.__name__} while processing {data}, "
f"{max_retries - retry_count} retries left. "
)
result = err
else:
break
retry_count += 1
# print and return result
logger.info(f"[{pid}] Processing {data} done")
return data, result
class RogueAIException(Exception):
pass
def workload(n):
"""
Quite rebellious Fibonacci function
"""
if random.randint(0, 1):
raise RogueAIException("I'm sorry Dave, I'm Afraid I can't do that.")
a, b = 0, 1
for _ in range(n - 1):
a, b = b, a + b
return b
def main():
data = [random.randint(0, 100) for _ in range(20)]
# fix parameters. Decorator can't be pickled, we'll have to live with this.
wrapped_workload = functools.partial(wrapper, workload, 3)
with Pool(processes=3) as pool:
# apply function for each data
results = pool.map(wrapped_workload, data)
print("\nInput Output")
for fed_data, result in results:
print(f"{fed_data:<6}{result}")
if __name__ == '__main__':
main()
Output:
INFO [13904 ] Processing 40
ERROR [13904 ] RogueAIException while processing 40, 3 retries left.
INFO [13904 ] Processing 40 done
INFO [13904 ] Processing 93
ERROR [13904 ] RogueAIException while processing 93, 3 retries left.
INFO [13904 ] Processing 93 done
INFO [13904 ] Processing 96
INFO [13904 ] Processing 96 done
INFO [13904 ] Processing 48
INFO [13904 ] Processing 48 done
INFO [13904 ] Processing 17
INFO [13904 ] Processing 17 done
INFO [13904 ] Processing 52
ERROR [13904 ] RogueAIException while processing 52, 3 retries left.
INFO [13904 ] Processing 52 done
INFO [13904 ] Processing 96
ERROR [13904 ] RogueAIException while processing 96, 3 retries left.
INFO [13904 ] Processing 96 done
INFO [13904 ] Processing 23
ERROR [13904 ] RogueAIException while processing 23, 3 retries left.
INFO [13904 ] Processing 23 done
INFO [13904 ] Processing 99
ERROR [13904 ] RogueAIException while processing 99, 3 retries left.
ERROR [13904 ] RogueAIException while processing 99, 2 retries left.
INFO [13904 ] Processing 99 done
INFO [13904 ] Processing 55
ERROR [13904 ] RogueAIException while processing 55, 3 retries left.
ERROR [13904 ] RogueAIException while processing 55, 2 retries left.
INFO [13904 ] Processing 55 done
INFO [13904 ] Processing 63
ERROR [13904 ] RogueAIException while processing 63, 3 retries left.
INFO [13904 ] Processing 63 done
INFO [13904 ] Processing 61
INFO [25180 ] Processing 3
ERROR [13904 ] RogueAIException while processing 61, 3 retries left.
INFO [25180 ] Processing 3 done
INFO [13904 ] Processing 61 done
INFO [25180 ] Processing 42
INFO [13904 ] Processing 33
ERROR [25180 ] RogueAIException while processing 42, 3 retries left.
ERROR [13904 ] RogueAIException while processing 33, 3 retries left.
ERROR [25180 ] RogueAIException while processing 42, 2 retries left.
INFO [13904 ] Processing 33 done
ERROR [25180 ] RogueAIException while processing 42, 1 retries left.
INFO [13904 ] Processing 2
INFO [25180 ] Processing 42 done
INFO [13904 ] Processing 2 done
INFO [25180 ] Processing 35
INFO [13904 ] Processing 45
INFO [25180 ] Processing 35 done
INFO [13904 ] Processing 45 done
INFO [25180 ] Processing 2
INFO [13904 ] Processing 11
ERROR [25180 ] RogueAIException while processing 2, 3 retries left.
ERROR [13904 ] RogueAIException while processing 11, 3 retries left.
INFO [25180 ] Processing 2 done
ERROR [13904 ] RogueAIException while processing 11, 2 retries left.
ERROR [13904 ] RogueAIException while processing 11, 1 retries left.
ERROR [13904 ] RogueAIException while processing 11, 0 retries left.
INFO [13904 ] Processing 11 done
Input Output
40 102334155
93 12200160415121876738
96 51680708854858323072
48 4807526976
17 1597
52 32951280099
96 51680708854858323072
23 28657
99 218922995834555169026
55 139583862445
63 6557470319842
61 2504730781961
3 2
42 267914296
33 3524578
2 1
35 9227465
2 1
45 1134903170
11 I'm sorry Dave, I'm Afraid I can't do that.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.