Python3 Flask 路由中的异步子进程挂起

Question

I'm using Flask 1.0.2 with Python 3.6 on Ubuntu 18.04.我在 Ubuntu 18.04 上使用 Flask 1.0.2 和 Python 3.6。 My app should use asyncio and asyncio.create_subprocess_exec() to lauch a background script, read stdout from it, and then return status when the script is done.我的应用程序应该使用 asyncio 和asyncio.create_subprocess_exec()来启动后台脚本，从中读取标准输出，然后在脚本完成后返回状态。

I am basically trying to implement an answer from this post: Non-blocking read on a subprocess.PIPE in python我基本上是在尝试从这篇文章中实现答案： Non-blocking read on a subprocess.PIPE in python

The script is successfully launched, and I get all of my expected output from it, but the problem is that it never returns ( meaning the Killing subprocess now line is never reached).该脚本已成功启动，我从中获得了所有预期的 output，但问题是它永远不会返回（这意味着永远不会到达Killing subprocess now行）。 When I check the process list ( ps ) from the Linux terminal, the background script has exited.当我从 Linux 终端检查进程列表 ( ps ) 时，后台脚本已退出。

What am I doing wrong and how can I successfully break out of the async for line in process.stdout loop?我做错了什么，如何成功地摆脱async for line in process.stdout ？

At the top of my file after my imports I create my event loop:在我导入后的文件顶部，我创建了我的事件循环：

# Create a loop to run all the tasks in.
global eventLoop ; asyncio.set_event_loop(None)
eventLoop = asyncio.new_event_loop()
asyncio.get_child_watcher().attach_loop(eventLoop)

I define my async coroutine above my route:我在路由上方定义了异步协程：

async def readAsyncFunctionAndKill(cmd):
    # Use global event loop
    global eventLoop

    print("[%s] Starting async Training Script ..." % (os.path.basename(__file__)))
    process = await asyncio.create_subprocess_exec(cmd,stdout=PIPE,loop=eventLoop)
    print("[%s] Starting to read stdout ..." % (os.path.basename(__file__)))
    async for line in process.stdout:
        line = line.decode(locale.getpreferredencoding(False))
        print("%s"%line, flush=True)
    print("[%s] Killing subprocess now ..." % (os.path.basename(__file__)))
    process.kill()
    print("[%s] Training process return code was: %s" % (os.path.basename(__file__), process.returncode))
    return await process.wait()  # wait for the child process to exit

And my (abbreviated) route is here:我的（缩写）路线在这里：

@app.route("/train_model", methods=["GET"])
def train_new_model():
    # Use global event loop
    global eventLoop   

    with closing(eventLoop):        
        eventLoop.run_until_complete(readAsyncFunctionAndKill("s.py"))

    return jsonify("done"), 200

The "s.py" script called is marked as executable and is in the same working directory.调用的“s.py”脚本被标记为可执行文件，并且位于同一工作目录中。 The abbreviated script is shown here ( it contains several subprocesses and instantiates PyTorch classes ):缩写脚本如下所示（它包含几个子进程并实例化 PyTorch 类）：

def main():

    # Ensure that swap is activated since we don't have enough RAM to train our model otherwise
    print("[%s] Activating swap now ..." % (os.path.basename(__file__)))
    subprocess.call("swapon -a", shell=True)

    # Need to initialize GPU
    print("[%s] Initializing GPU ..." % (os.path.basename(__file__)))
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    defaults.device = torch.device("cuda")
    with torch.cuda.device(0):
        torch.tensor([1.]).cuda()

    print("[%s] Cuda is Available: %s - with Name: %s ..." % (os.path.basename(__file__),torch.cuda.is_available(),torch.cuda.get_device_name(0)))

    try:

        print("[%s] Beginning to train new model and replace existing model ..." % (os.path.basename(__file__)))


        # Batch size
        bs = 16
        #bs = 8

        # Create ImageBunch
        tfms = get_transforms(do_flip=True,
                              flip_vert=True,
                              max_rotate=180.,
                              max_zoom=1.1,
                              max_lighting=0.5,
                              max_warp=0.1,
                              p_affine=0.75,
                              p_lighting=0.75)

        # Create databunch using folder names as class names
        # This also applies the transforms and batch size to the data
        os.chdir(TRAINING_DIR)
        data = ImageDataBunch.from_folder("TrainingData", ds_tfms=tfms, train='.', valid_pct=0.2, bs=bs)

        ...    

        # Create a new learner with an early stop callback
        learn = cnn_learner(data, models.resnet18, metrics=[accuracy], callback_fns=[
            partial(EarlyStoppingCallback, monitor='accuracy', min_delta=0.01, patience=3)])

        ... 

        print("[%s] All done training ..." % (os.path.basename(__file__)))

        # Success
        sys.exit(0)

    except Exception as err:

        print("[%s] Error training model [ %s ] ..." % (os.path.basename(__file__),err))
        sys.exit(255)

if __name__== "__main__":
  main()

Answer 1

There are several concerns here:这里有几个问题：

You are creating a new event loop on import, once , but close the event loop in your view.您在导入时创建了一个新的事件循环，一次，但在您的视图中关闭事件循环。 There is no need to close the loop, at all, because a second request will now fail because the loop is closed.根本不需要关闭循环，因为第二个请求现在将失败，因为循环已关闭。
The asyncio event loop is not thread safe, and should not be shared between threads. asyncio 事件循环不是线程安全的，不应在线程之间共享。 The vast majority of Flask deployments will use threads to handle incoming requests.绝大多数 Flask 部署将使用线程来处理传入请求。 Your code carries echoes of how this should be handled instead but unfortunately it is not the correct approach.您的代码带有应如何处理的回声，但不幸的是它不是正确的方法。 Eg asyncio.get_child_watcher().attach_loop(eventLoop) is mostly redundant because eventLoop = asyncio.new_event_loop() , if run on the main thread, already does exactly that.例如asyncio.get_child_watcher().attach_loop(eventLoop)主要是多余的，因为eventLoop = asyncio.new_event_loop() ，如果在主线程上运行，已经完全做到了。
This is the main candidate for the issues you are seeing.这是您所看到的问题的主要候选人。
Your code assumes that the executable is in fact present and executable.您的代码假定可执行文件实际上存在并且可执行。 You should be handling OSError exceptions (and subclasses), because an unqualified s.py would only work if it is made executable, starts with a #!您应该处理OSError异常（和子类），因为不合格的s.py仅在使其可执行时才有效，以#! shebang line and is found on the PATH . shebang 线，位于PATH上。 It won't work just because it is in the same directory, nor would you want to rely on the current working directory anyway.它不会因为它在同一个目录中而工作，你也不想依赖当前的工作目录。
Your code assumes that the process closes stdout at some point .您的代码假定该进程在某个时间点关闭标准输出。 If the subprocess never closes stdout (something that happens automatically when the process exits) then your async for line in process.stdout: loop will wait forever too.如果子进程从不关闭标准输出（进程退出时会自动发生的事情），那么您的async for line in process.stdout:循环也将永远等待。 Consider adding timeouts to the code to avoid getting blocked on a faulty subprocess.考虑在代码中添加超时以避免在错误的子进程上被阻塞。

There are two sections in the Python asyncio documentation that you really would want to read when using asyncio subprocesses in a multi-threaded application: Python asyncio 文档中有两个部分是您在多线程应用程序中使用 asyncio 子进程时真正需要阅读的部分：

The Concurrency and Multithreading section , explaining that Almost all asyncio objects are not thread safe . 并发和多线程部分，解释了几乎所有异步对象都不是线程安全的。 You don't want to add tasks to the loop from other threads directly;您不想直接从其他线程将任务添加到循环中； you want to either use an event loop per thread, or use the asyncio.run_coroutine_threadsafe() function to run a coroutine on a loop in a specific thread.您想为每个线程使用一个事件循环，或者使用asyncio.run_coroutine_threadsafe() function在特定线程的循环上运行协程。
For Python versions up to 3.7, you also need to read the Subprocess and Threads section , because up until that version asyncio uses a non-blocking os.waitpid(-1, os.WNOHANG) call to track child state and relies on using signal handling (which can only be done on the main thread).对于 Python 版本高达 3.7，您还需要阅读Subprocess and Threads部分，因为在该版本之前， asyncio使用非阻塞os.waitpid(-1, os.WNOHANG)调用来跟踪子 state 并使用信号处理（只能在主线程上完成）。 Python 3.8 removed this restriction (by adding a new child watcher implementation that uses a blocking per-process os.waitpid() call in a separate thread, at the expense of extra memory. Python 3.8 删除了这个限制（通过添加一个新的子观察程序实现，该实现在单独的线程中使用阻塞的每进程os.waitpid()调用，代价是额外的 memory。
You don't have to stick to the default child watcher strategy, however.但是，您不必坚持默认的子观察者策略。 You can use EventLoopPolicy.set_child_watcher() and passing in a different process watcher instance .您可以使用EventLoopPolicy.set_child_watcher()并传入不同的进程观察者实例。 In practice that means backporting the 3.8 ThreadedChildWatcher implementation .实际上，这意味着向后移植3.8 ThreadedChildWatcher实现。

For your use case, there really no need to need to run a new event loop per thread.对于您的用例，实际上不需要为每个线程运行一个新的事件循环。 Run a single loop, in a separate thread as needed.根据需要在单独的线程中运行单个循环。 If you use a loop in a separate thread, depending on your Python version, you may need to have a running loop on the main thread as well or use a different process watcher.如果您在单独的线程中使用循环，根据您的 Python 版本，您可能还需要在主线程上运行循环或使用不同的进程观察器。 Generally speaking, running an asyncio loop on the main thread in a WSGI server is not going to be easy or even possible.一般来说，在 WSGI 服务器的主线程上运行 asyncio 循环并不容易，甚至不可能。

So you need to run a loop, permanently, in a separate thread, and you need to use a child process watcher that works without a main thread loop.因此，您需要在一个单独的线程中永久地运行一个循环，并且您需要使用一个在没有主线程循环的情况下工作的子进程观察器。 Here is an implementation for just that, and this should work for Python versions 3.6 and newer:这是一个实现，这应该适用于 Python 版本 3.6 和更高版本：

import asyncio
import itertools
import logging
import time
import threading

try:
    # Python 3.8 or newer has a suitable process watcher
    asyncio.ThreadedChildWatcher
except AttributeError:
    # backport the Python 3.8 threaded child watcher
    import os
    import warnings

    # Python 3.7 preferred API
    _get_running_loop = getattr(asyncio, "get_running_loop", asyncio.get_event_loop)

    class _Py38ThreadedChildWatcher(asyncio.AbstractChildWatcher):
        def __init__(self):
            self._pid_counter = itertools.count(0)
            self._threads = {}

        def is_active(self):
            return True

        def close(self):
            pass

        def __enter__(self):
            return self

        def __exit__(self, exc_type, exc_val, exc_tb):
            pass

        def __del__(self, _warn=warnings.warn):
            threads = [t for t in list(self._threads.values()) if t.is_alive()]
            if threads:
                _warn(
                    f"{self.__class__} has registered but not finished child processes",
                    ResourceWarning,
                    source=self,
                )

        def add_child_handler(self, pid, callback, *args):
            loop = _get_running_loop()
            thread = threading.Thread(
                target=self._do_waitpid,
                name=f"waitpid-{next(self._pid_counter)}",
                args=(loop, pid, callback, args),
                daemon=True,
            )
            self._threads[pid] = thread
            thread.start()

        def remove_child_handler(self, pid):
            # asyncio never calls remove_child_handler() !!!
            # The method is no-op but is implemented because
            # abstract base class requires it
            return True

        def attach_loop(self, loop):
            pass

        def _do_waitpid(self, loop, expected_pid, callback, args):
            assert expected_pid > 0

            try:
                pid, status = os.waitpid(expected_pid, 0)
            except ChildProcessError:
                # The child process is already reaped
                # (may happen if waitpid() is called elsewhere).
                pid = expected_pid
                returncode = 255
                logger.warning(
                    "Unknown child process pid %d, will report returncode 255", pid
                )
            else:
                if os.WIFSIGNALED(status):
                    returncode = -os.WTERMSIG(status)
                elif os.WIFEXITED(status):
                    returncode = os.WEXITSTATUS(status)
                else:
                    returncode = status

                if loop.get_debug():
                    logger.debug(
                        "process %s exited with returncode %s", expected_pid, returncode
                    )

            if loop.is_closed():
                logger.warning("Loop %r that handles pid %r is closed", loop, pid)
            else:
                loop.call_soon_threadsafe(callback, pid, returncode, *args)

            self._threads.pop(expected_pid)

    # add the watcher to the loop policy
    asyncio.get_event_loop_policy().set_child_watcher(_Py38ThreadedChildWatcher())

__all__ = ["EventLoopThread", "get_event_loop", "stop_event_loop", "run_coroutine"]

logger = logging.getLogger(__name__)

class EventLoopThread(threading.Thread):
    loop = None
    _count = itertools.count(0)

    def __init__(self):
        name = f"{type(self).__name__}-{next(self._count)}"
        super().__init__(name=name, daemon=True)

    def __repr__(self):
        loop, r, c, d = self.loop, False, True, False
        if loop is not None:
            r, c, d = loop.is_running(), loop.is_closed(), loop.get_debug()
        return (
            f"<{type(self).__name__} {self.name} id={self.ident} "
            f"running={r} closed={c} debug={d}>"
        )

    def run(self):
        self.loop = loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)

        try:
            loop.run_forever()
        finally:
            try:
                shutdown_asyncgens = loop.shutdown_asyncgens()
            except AttributeError:
                pass
            else:
                loop.run_until_complete(shutdown_asyncgens)
            loop.close()
            asyncio.set_event_loop(None)

    def stop(self):
        loop, self.loop = self.loop, None
        if loop is None:
            return
        loop.call_soon_threadsafe(loop.stop)
        self.join()

_lock = threading.Lock()
_loop_thread = None

def get_event_loop():
    global _loop_thread
    if _loop_thread is None:
        with _lock:
            if _loop_thread is None:
                _loop_thread = EventLoopThread()
                _loop_thread.start()
                # give the thread up to a second to produce a loop
                deadline = time.time() + 1
                while not _loop_thread.loop and time.time() < deadline:
                    time.sleep(0.001)

    return _loop_thread.loop

def stop_event_loop():
    global _loop_thread
    with _lock:
        if _loop_thread is not None:
            _loop_thread.stop()
            _loop_thread = None

def run_coroutine(coro):
    return asyncio.run_coroutine_threadsafe(coro, get_event_loop())

The above is the same general 'run async with Flask' solution as I posted for Make a Python asyncio call from a Flask route , but with the addition of the ThreadedChildWatcher backport.以上是与我发布的从 Flask 路由进行 Python 异步调用相同的通用“使用 Flask 异步运行”解决方案，但添加了ThreadedChildWatcher后向端口。

You can then use the loop returned from get_event_loop() to run child processes, by calling run_coroutine_threadsafe() :然后，您可以使用从get_event_loop()返回的循环来运行子进程，方法是调用run_coroutine_threadsafe() ：

import asyncio
import locale
import logging

logger = logging.getLogger(__name__)


def get_command_output(cmd, timeout=None):
    encoding = locale.getpreferredencoding(False)

    async def run_async():
        try:
            process = await asyncio.create_subprocess_exec(
                cmd, stdout=asyncio.subprocess.PIPE)
        except OSError:
            logging.exception("Process %s could not be started", cmd)
            return
        
        async for line in process.stdout:
            line = line.decode(encoding)
            # TODO: actually do something with the data.
            print(line, flush=True)

        process.kill()
        logging.debug("Process for %s exiting with %i", cmd, process.returncode)

        return await process.wait()

    future = run_coroutine(run_async())
    result = None
    try:
        result = future.result(timeout)
    except asyncio.TimeoutError:
        logger.warn('The child process took too long, cancelling the task...')
        future.cancel()
    except Exception as exc:
        logger.exception(f'The child process raised an exception')
    return result

Note that the above function can take a timeout, in seconds, the maximum amount of time you'll wait for the subprocess to complete.请注意，上述 function 可能会超时，以秒为单位，您将等待子进程完成的最长时间。

Python3 Flask 路由中的异步子进程挂起

问题描述

1 个解决方案

解决方案1
9 已采纳 2019-10-29 20:10:00

Python3 Flask 路由中的异步子进程挂起

问题描述

1 个解决方案

解决方案1 9 已采纳 2019-10-29 20:10:00

解决方案1
9 已采纳 2019-10-29 20:10:00