简体   繁体   中英

Cannot get subprocess return code in python3

I'm trying to make something like supervisor for my python daemon process and found out that same code works in python2 and doesn't work in python3.

Generally, I've come to this minimal example code.

daemon.py

#!/usr/bin/env python

import signal
import sys
import os


def stop(*args, **kwargs):
    print('daemon exited', os.getpid())
    sys.exit(0)


signal.signal(signal.SIGTERM, stop)

print('daemon started', os.getpid())

while True:
    pass

supervisor.py

import os
import signal
import subprocess

from time import sleep


parent_pid = os.getpid()
commands = [
    [
        './daemon.py'
    ]
]
popen_list = []
for command in commands:
    popen = subprocess.Popen(command, preexec_fn=os.setsid)
    popen_list.append(popen)


def stop_workers(*args, **kwargs):
    for popen in popen_list:
        print('send_signal', popen.pid)
        popen.send_signal(signal.SIGTERM)

        while True:
            popen_return_code = popen.poll()
            if popen_return_code is not None:
                break
            sleep(5)


signal.signal(signal.SIGTERM, stop_workers)

for popen in popen_list:
    print('wait_main', popen.wait())

If you run supervisor.py and then call kill -15 on its pid, then it will hang in infinite loop, because popen_return_code will never be not None. I discovered, that it's basically because of adding threading.Lock for wait_pid operation ( source ), but how can I rewrite code so it'll handle child exit correctly?

This is an interesting case.

I've spent few hours trying to figure out the reason why this happens and the only thing I came up with at this moment is that the implementation of wait() and poll() have been changed in python3 versus python2.7 .

Looking into the source code of python3/suprocess.py implementation, we can see that there is a lock acquire happens when you call wait() method of Popen object, see

https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1402 .

This lock prevents further poll() calls to work as expected until the lock acquired by wait() will be released, see

https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1355

and comment there

Something else is busy calling waitpid. Don't allow two at once. We know nothing yet.

There is no such a lock in python2.7/subprocess.py so this looks like a reason why it works in python2.7 and doesn't work in python3 .

However I don't see a reason why are you trying to poll() inside the signal handler, try rewrite your supervisor.py as following, this should work as expected both on python3 and python2.7

supervisor.py

import os
import signal
import subprocess

from time import sleep


parent_pid = os.getpid()
commands = [
    [
        './daemon.py'
    ]
]
popen_list = []
for command in commands:
    popen = subprocess.Popen(command, preexec_fn=os.setsid)
    popen_list.append(popen)


def stop_workers(*args, **kwargs):
    for popen in popen_list:
        print('send_signal', popen.pid)
        popen.send_signal(signal.SIGTERM)

signal.signal(signal.SIGTERM, stop_workers)

for popen in popen_list:
    print('wait_main', popen.wait())

Hope this helps

Generally, I agree with answer from @risboo6909, but also have some thoughts, how to fix this situation.

  1. You can change subproccess.Popen to psutil.Popen .
  2. In main loop instead of popen.wait() you can just do infinite loop, because process will exit in signal handler.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM