使用 python 从进程中捕获输出的最佳方法是什么？

Question

I am using python's subprocess module to start a new process.我正在使用 python 的subprocess模块来启动一个新进程。 I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.).我想实时捕获新进程的输出，以便我可以用它做一些事情（显示它，解析它等）。 I have seen many examples of how this can be done, some use custom file-like objects, some use threading and some attempt to read the output until the process has completed.我已经看到了很多关于如何做到这一点的例子，有些使用自定义的类文件对象，有些使用threading ，有些尝试读取输出直到过程完成。

File Like Objects Example (click me) 文件类对象示例（单击我）

I would prefer not to use custom file-like objects because I want to allow users to supply their own values for stdin , stdout and stderr .我不想使用自定义的类文件对象，因为我希望允许用户为stdin 、 stdout和stderr提供他们自己的值。

Threading Example (click me) 线程示例（点我）

I do not really understand why threading is required so I am reluctant to follow this example.我真的不明白为什么需要线程，所以我不愿意遵循这个例子。 If someone can explain why the threading example makes sense I would be happy listen.如果有人能解释为什么线程示例有意义，我会很高兴听。 However, this example also restricts users from supplying their own stdout and stderr values.但是，此示例还限制用户提供他们自己的stdout和stderr值。

Read Output Example (see below)读取输出示例（见下文）

The example which makes the most sense to me is to read the stdout , stderr until the process has finished.对我来说最有意义的示例是读取stdout 、 stderr直到该过程完成。 Here is some example code:下面是一些示例代码：

import subprocess

# Start a process which prints the options to the python program.
process = subprocess.Popen(
    ["python", "-h"],
    bufsize=1,
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
)    

# While the process is running, display the output to the user.
while True:

    # Read standard output data.
    for stdout_line in iter(process.stdout.readline, ""):

        # Display standard output data.
        sys.stdout.write(stdout_line)

    # Read standard error data.
    for stderr_line in iter(process.stderr.readline, ""):

        # Display standard error data.
        sys.stderr.write(stderr_line)

    # If the process is complete - exit loop.
    if process.poll() != None:
        break

My question is,我的问题是，

Q. Is there a recommended approach for capturing the output of a process using python?问：是否有使用 python 捕获进程输出的推荐方法？

Answer 1

First, your design is a bit silly, since you can do the same thing like this:首先，你的设计有点傻，因为你可以做同样的事情：

process = subprocess.Popen(
                           ["python", "-h"],
                           bufsize=1,
                           stdout=sys.stdout,
                           stderr=sys.stderr
                           )

… or, even better: ……或者，甚至更好：

process = subprocess.Popen(
                           ["python", "-h"],
                           bufsize=1
                           )

However, I'll assume that's just a toy example, and you might want to do something more useful.但是，我假设这只是一个玩具示例，您可能想要做一些更有用的事情。

The main problem with your design is that it won't read anything from stderr until stdout is done.您的设计的主要问题是在完成stdout之前它不会从stderr读取任何内容。

Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs.想象一下，您正在驾驶一个 MP3 播放器，将每个曲目名称打印到 stdout，并将信息记录到 stderr，并且您想要播放 10 首歌曲。 Do you really want to wait 30 minutes before displaying any of the logging to your users?在向用户显示任何日志记录之前，您真的要等待 30 分钟吗？

If that is acceptable, then you might as well just use communicate , which takes care of all of the headaches for you.如果这是可以接受的，那么您不妨使用communicate ，它会为您解决所有头痛的问题。

Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child?另外，即使您的模型可以接受，您确定可以在管道中排队那么多未读数据而不会阻塞孩子吗？ On every platform?在每个平台上？

Just breaking up the loop to alternate between the two won't help, because you could end up blocking on stdout.readline() for 5 minutes while stderr is piling up.只是中断循环以在两者之间交替将无济于事，因为当stderr堆积时，您最终可能会在stdout.readline()上阻塞 5 分钟。

So that's why you need some way to read from both at once.所以这就是为什么您需要某种方式同时读取两者的原因。

How do you read from two pipes at once?你如何一次读取两个管道？

This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.).这是与一次处理 1000 个网络客户端相同的问题（但更小），并且具有相同的解决方案：线程化或多路复用（以及各种混合，例如在多路复用器和事件循环之上执行绿色线程，或使用螺纹前摄器等）。

The best sample code for the threaded version is communicate from the 3.2+ source code.线程版本的最佳示例代码是来自 3.2+ 源代码的communicate 。 It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity.这有点复杂，但是如果您想在 Windows 和 Unix 上正确处理所有边缘情况，那么确实无法避免一点复杂性。

For multiplexing, you can use the select module, but keep in mind that this only works on Unix (you can't select on pipes on Windows), and it's buggy without 3.2+ (or the subprocess32 backport), and to really get all the edge cases right you need to add a signal handler to your select .对于复用，您可以使用select的模块，但请记住，这仅适用于Unix（你不能select在Windows上的管道），它的越野车没有3.2+（或subprocess32反向移植），并且真正得到所有在边缘情况下，您需要向select添加一个信号处理程序。 Unless you really, really don't want to use threading, this is the harder answer.除非你真的，真的不想使用线程，否则这是更难的答案。

But the easy answer is to use someone else's implementation.但简单的答案是使用其他人的实现。 There are a dozen or more modules on PyPI specifically for async subprocesses. PyPI 上有十几个或更多模块专门用于异步子进程。 Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's asyncio ) includes subprocess support out of the box, that works on both Unix and Windows.或者，如果您已经有充分的理由围绕事件循环编写应用程序，那么几乎每个现代事件循环驱动的异步网络库（包括 stdlib 的asyncio ）都包含开箱即用的子asyncio支持，适用于 Unix 和视窗。

Is there a recommended approach for capturing the output of a process using python?是否有推荐的方法来使用 python 捕获进程的输出？

It depends on who you're asking;这取决于你问的是谁； a thousand Python developers might have a thousand different answers… or at least half a dozen.一千名 Python 开发人员可能会有一千种不同的答案……或者至少有六种。 If you're asking what the core devs would recommend, I can take a guess:如果您要问核心开发人员会推荐什么，我可以猜测：

If you don't need to capture it asynchronously, use communicate (but make sure to upgrade to at least 3.2 for important bug fixes).如果您不需要异步捕获它，请使用communicate （但请确保至少升级到 3.2 以进行重要的错误修复）。 If you do need to capture it asynchronously, use asyncio (which requires 3.4).如果确实需要异步捕获它，请使用asyncio （需要 3.4）。

使用 python 从进程中捕获输出的最佳方法是什么？

问题描述

1 个解决方案

解决方案1
2 2014-01-14 04:54:53

使用 python 从进程中捕获输出的最佳方法是什么？

问题描述

1 个解决方案

解决方案1 2 2014-01-14 04:54:53

解决方案1
2 2014-01-14 04:54:53