[英]Python multiprocessing - Build up of something causing script to hang?
I have a function that scrolls an image, and a subprocess within it that generates the next image behind the scenes to display, so that the transition between the displaying image and the new image is seamless.我有一个滚动图像的 function,其中有一个子进程在幕后生成下一个要显示的图像,以便显示图像和新图像之间的过渡是无缝的。
The script works fine at the start.该脚本在开始时运行良好。 However, after a few hours of using this script, the subprocess eventually gets stuck on a write call.
但是,在使用此脚本几个小时后,子进程最终会卡在写入调用上。 Killing the subprocesses doesn't help, as the next subprocess spawned also gets stuck.
杀死子进程没有帮助,因为产生的下一个子进程也会卡住。 So it definitely means something is building up and is not clearing up in the script, but I'm not sure what.
所以这绝对意味着脚本中的某些东西正在建立并且没有清理,但我不确定是什么。
The only way to fix this is to kill the parent process, and run it again.解决此问题的唯一方法是终止父进程,然后再次运行它。
def updateMultiple(self, options):
# print('options', options)
for option in options:
if option not in ['Custom GIFs', 'Custom Images', 'Custom Messages']: # these images are already saved in user uploads, dodnt need to update them
img = self.functions[option]()
img.save('./display_images/'+ option+ '.ppm')
def scrollFunctionsAnimated(self, options, animation = 'down', repeat = True):
# scrolls trhough all functions with animation. Updates functions and remakes images when each function not being dispplayed
self.updateMultiple([options[0]])
kill = False
i = 0 # keep track of which image we are displaying
self.double_buffer = self.matrix.CreateFrameCanvas()
while True:
update_process = Process(target = self.updateMultiple, args = ([options[(i+1) % len(options)]],))
update_process.start()
update_process.join()
settings = json.load(open(self.JSONs[options[(i) % len(options)]]))
self.set_delay(settings['speed'])
animation = settings['animation'].lower()
if options[i % len(options)] == 'Custom Images':
images = self.getUserImages()
elif options[i % len(options)] == 'Custom GIFs':
images = self.getUserGIFs()
elif options[i % len(options)] == 'Custom Messages':
images = self.getUserMessages()
else: #these options just have a single ppm image
image = self.openImage('./display_images/' + options[i % len(options)] +'.ppm')
images = [image]
for image in images:
img_width, img_height = image.size
offset_x = 0
if animation == 'continuous':
offset_x = 128
elif animation in ['up', 'down']:
offset_x = max(0, 128-img_width)
offset_y = 0
#first scroll image in from bottom
frame_skip = int((1/15)/self.delay) #controls how fast gifs run
self.frame = 0
pause_frames = int(0.5/self.delay)
if animation == 'up':
offset_y = 33
direction = -1
kill = self.scrollImageY(image, direction = direction, offset_x = offset_x, offset_y = offset_y, frame_skip = frame_skip, gif = options[i % len(options)] == 'Custom GIFs')
elif animation == 'down':
direction = 1
offset_y = -33
kill = self.scrollImageY(image, direction = direction, offset_x = offset_x, offset_y = offset_y, frame_skip = frame_skip, gif = options[i % len(options)] == 'Custom GIFs')
if kill: break
offset_y = 0
if animation in ['up', 'down']:
while pause_frames > 0:
if pause_frames%frame_skip == 0:
self.incrementGIF(image)
pause_frames -=1
if options[i % len(options)] != 'Custom GIFs':
self.double_buffer.SetImage(image, offset_x, offset_y)
else:
self.double_buffer.SetImage(image.convert('RGB'), offset_x, offset_y)
self.double_buffer = self.matrix.SwapOnVSync(self.double_buffer)
time.sleep(self.delay)
kill = self.checkKilled()
if kill: break
if kill: break
if kill: break
try:
pause = float(settings['pause'])
except:
pause = 0
pause_frames = int(float(pause)/self.delay)
kill = self.scrollImage(image, offset_x = offset_x, offset_y = offset_y, frame_skip = frame_skip, gif = options[i % len(options)] == 'Custom GIFs', pause_frames = pause_frames)
if kill: break
if kill:break
if not repeat:
break
update_process.join()
update_process.terminate()
i+=1
File descriptors of the hanging subprocess:挂起子进程的文件描述符:
I also have logs from strace below:我也有来自 strace 下面的日志:
This is what it looks like when the script is working normally.这是脚本正常运行时的样子。
strace: Process 1693 attached
[pid 1693] close(4) = 0 <0.000028>
[pid 1693] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 4 <0.000077>
[pid 1693] write(1, "options ['Forex']\n", 18) = 18 <0.000048>
[pid 1693] openat(AT_FDCWD, "csv/forex_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000064>
[pid 1693] read(6, "{\"feature\": \"Stocks\", \"speed\": \""..., 348) = 347 <0.000034>
[pid 1693] read(6, "", 1) = 0 <0.000025>
[pid 1693] close(6) = 0 <0.000039>
[pid 1693] openat(AT_FDCWD, "feature_titles/forex.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000062>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0`\0\0\0 \4\3\0\0\0\37\356`"..., 4096) = 388 <0.000035>
[pid 412] close(5) = 0 <0.000064>
[pid 412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000143>
[pid 412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000060>
[pid 412] read(5, "", 1) = 0 <0.000040>
[pid 412] close(5) = 0 <0.000097>
[pid 412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000125>
[pid 1693] close(6) = 0 <0.000027>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000052>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/stocks/down-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000154>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000073>
[pid 412] read(5, <unfinished ...>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 199 <0.000035>
[pid 412] <... read resumed> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000554>
[pid 412] read(5, "", 8192) = 0 <0.000047>
[pid 412] close(5) = 0 <0.000095>
[pid 1693] close(6) = 0 <0.000044>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/EUR.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000074>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 968 <0.000040>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000146>
[pid 1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000038>
[pid 1693] close(7) = 0 <0.000056>
[pid 1693] close(6) = 0 <0.000042>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/stocks/up-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000068>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 192 <0.000039>
[pid 1693] close(6) = 0 <0.000045>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000112>
[pid 1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000039>
[pid 1693] openat(AT_FDCWD, "/home/pi/logos/currencies/JPY.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000151>
[pid 1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 769 <0.000047>
[pid 1693] close(7) = 0 <0.000060>
[pid 1693] close(6) = 0 <0.000043>
[pid 1693] openat(AT_FDCWD, "./display_images/Forex.ppm", O_RDWR|O_CREAT|O_TRUNC|O_LARGEFILE|O_CLOEXEC, 0666) = 6 <0.000359>
[pid 1693] write(6, "P6\n507 32\n255\n", 14) = 14 <0.000104>
[pid 1693] write(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 48672) = 48672 <0.000269>
[pid 1693] close(6) = 0 <0.000400>
[pid 1693] +++ exited with 0 +++
[pid 412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1693, si_uid=1, si_status=0, si_utime=5, si_stime=9} ---
[pid 412] close(4) = 0 <0.000064>
About 40-60 minutes later of running the Python script, here are the strace results of the hanging subprocess .运行 Python 脚本大约 40-60 分钟后,这里是挂起子进程的 strace 结果。 PID 2220 seems to be stuck with the write(1, "options ['Forex']\n", 18 <unfinished...> syscall? How can I check what is causing it to hang?
PID 2220 似乎被 write(1, "options ['Forex']\n", 18 <unfinished...> 系统调用卡住了?如何检查导致它挂起的原因?
strace: Process 2220 attached
[pid 412] close(5) = 0 <0.000048>
[pid 412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000163>
[pid 412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000071>
[pid 2220] close(4 <unfinished ...>
[pid 412] read(5, <unfinished ...>
[pid 2220] <... close resumed> ) = 0 <0.000083>
[pid 412] <... read resumed> "", 1) = 0 <0.000105>
[pid 2220] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC <unfinished ...>
[pid 412] close(5 <unfinished ...>
[pid 2220] <... openat resumed> ) = 4 <0.000137>
[pid 412] <... close resumed> ) = 0 <0.000136>
[pid 412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000099>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000046>
[pid 2220] write(1, "options ['Forex']\n", 18 <unfinished ...>
[pid 412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000079>
[pid 412] read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000322>
[pid 412] read(5, "", 8192) = 0 <0.000067>
[pid 412] close(5) = 0 <0.000048>
[pid 2220] <... write resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <9.479316>
[pid 2220] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=412, si_uid=1} ---
[pid 2220] +++ killed by SIGTERM +++
[pid 412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=2220, si_uid=1, si_status=SIGTERM, si_utime=0, si_stime=1} ---
[pid 412] close(4) = 0 <0.000218>
It should be hang in the following line,它应该挂在下面一行,
[pid 2220] write(1, "options ['Forex']\n", 18 <unfinished ...>
you can lookup use cat /proc/<pid>/stack
to it callstack, check it a few times to see if it has changed, if it doesn't change, it means it's hanging你可以使用
cat /proc/<pid>/stack
来查找它的callstack,检查几次看它是否有变化,如果没有变化,说明它挂了
fd = 1 -> /dev/pts/0, the stdout is the remote terminal, it is recommended that you redirect to the file when running python, eg: fd = 1 -> /dev/pts/0,stdout为远程终端,建议运行python时重定向到该文件,eg:
python test.py >test.log 2>&1 0>/dev/null
root@dev8:~/tmp# cat test.py
import time
import sys
while True:
print(time.time())
sys.stdout.flush()
time.sleep(1)
root@dev8:~/tmp# python test.py >test.log 2>&1 0>/dev/null &
[1] 4203
root@dev8:~/tmp# ls -l /proc/4203/fd
total 0
l-wx------ 1 root root 64 Apr 6 19:16 0 -> /dev/null
l-wx------ 1 root root 64 Apr 6 19:16 1 -> /root/tmp/test.log
l-wx------ 1 root root 64 Apr 6 19:16 2 -> /root/tmp/test.log
root@dev8:~/tmp# head test.log
1649243765.043044
1649243766.0584888
1649243767.1079023
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.