[英]What is the best way to capture output from a process using python?
I am using python's subprocess
module to start a new process.我正在使用 python 的
subprocess
模块来启动一个新进程。 I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.).我想实时捕获新进程的输出,以便我可以用它做一些事情(显示它,解析它等)。 I have seen many examples of how this can be done, some use custom file-like objects, some use
threading
and some attempt to read the output until the process has completed.我已经看到了很多关于如何做到这一点的例子,有些使用自定义的类文件对象,有些使用
threading
,有些尝试读取输出直到过程完成。
File Like Objects Example (click me) 文件类对象示例(单击我)
stdin
, stdout
and stderr
.stdin
、 stdout
和stderr
提供他们自己的值。 Threading Example (click me) 线程示例(点我)
stdout
and stderr
values.stdout
和stderr
值。 Read Output Example (see below)读取输出示例(见下文)
The example which makes the most sense to me is to read the stdout
, stderr
until the process has finished.对我来说最有意义的示例是读取
stdout
、 stderr
直到该过程完成。 Here is some example code:下面是一些示例代码:
import subprocess
# Start a process which prints the options to the python program.
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
# While the process is running, display the output to the user.
while True:
# Read standard output data.
for stdout_line in iter(process.stdout.readline, ""):
# Display standard output data.
sys.stdout.write(stdout_line)
# Read standard error data.
for stderr_line in iter(process.stderr.readline, ""):
# Display standard error data.
sys.stderr.write(stderr_line)
# If the process is complete - exit loop.
if process.poll() != None:
break
My question is,我的问题是,
Q. Is there a recommended approach for capturing the output of a process using python?问:是否有使用 python 捕获进程输出的推荐方法?
First, your design is a bit silly, since you can do the same thing like this:首先,你的设计有点傻,因为你可以做同样的事情:
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdout=sys.stdout,
stderr=sys.stderr
)
… or, even better: ……或者,甚至更好:
process = subprocess.Popen(
["python", "-h"],
bufsize=1
)
However, I'll assume that's just a toy example, and you might want to do something more useful.但是,我假设这只是一个玩具示例,您可能想要做一些更有用的事情。
The main problem with your design is that it won't read anything from stderr
until stdout
is done.您的设计的主要问题是在完成
stdout
之前它不会从stderr
读取任何内容。
Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs.想象一下,您正在驾驶一个 MP3 播放器,将每个曲目名称打印到 stdout,并将信息记录到 stderr,并且您想要播放 10 首歌曲。 Do you really want to wait 30 minutes before displaying any of the logging to your users?
在向用户显示任何日志记录之前,您真的要等待 30 分钟吗?
If that is acceptable, then you might as well just use communicate
, which takes care of all of the headaches for you.如果这是可以接受的,那么您不妨使用
communicate
,它会为您解决所有头痛的问题。
Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child?另外,即使您的模型可以接受,您确定可以在管道中排队那么多未读数据而不会阻塞孩子吗? On every platform?
在每个平台上?
Just breaking up the loop to alternate between the two won't help, because you could end up blocking on stdout.readline()
for 5 minutes while stderr
is piling up.只是中断循环以在两者之间交替将无济于事,因为当
stderr
堆积时,您最终可能会在stdout.readline()
上阻塞 5 分钟。
So that's why you need some way to read from both at once.所以这就是为什么您需要某种方式同时读取两者的原因。
How do you read from two pipes at once?你如何一次读取两个管道?
This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.).这是与一次处理 1000 个网络客户端相同的问题(但更小),并且具有相同的解决方案:线程化或多路复用(以及各种混合,例如在多路复用器和事件循环之上执行绿色线程,或使用螺纹前摄器等)。
The best sample code for the threaded version is communicate
from the 3.2+ source code.线程版本的最佳示例代码是来自 3.2+ 源代码的
communicate
。 It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity.这有点复杂,但是如果您想在 Windows 和 Unix 上正确处理所有边缘情况,那么确实无法避免一点复杂性。
For multiplexing, you can use the select
module, but keep in mind that this only works on Unix (you can't select
on pipes on Windows), and it's buggy without 3.2+ (or the subprocess32
backport), and to really get all the edge cases right you need to add a signal handler to your select
.对于复用,您可以使用
select
的模块,但请记住,这仅适用于Unix(你不能select
在Windows上的管道),它的越野车没有3.2+(或subprocess32
反向移植),并且真正得到所有在边缘情况下,您需要向select
添加一个信号处理程序。 Unless you really, really don't want to use threading, this is the harder answer.除非你真的,真的不想使用线程,否则这是更难的答案。
But the easy answer is to use someone else's implementation.但简单的答案是使用其他人的实现。 There are a dozen or more modules on PyPI specifically for async subprocesses.
PyPI 上有十几个或更多模块专门用于异步子进程。 Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's
asyncio
) includes subprocess support out of the box, that works on both Unix and Windows.或者,如果您已经有充分的理由围绕事件循环编写应用程序,那么几乎每个现代事件循环驱动的异步网络库(包括 stdlib 的
asyncio
)都包含开箱即用的子asyncio
支持,适用于 Unix 和视窗。
Is there a recommended approach for capturing the output of a process using python?
是否有推荐的方法来使用 python 捕获进程的输出?
It depends on who you're asking;这取决于你问的是谁; a thousand Python developers might have a thousand different answers… or at least half a dozen.
一千名 Python 开发人员可能会有一千种不同的答案……或者至少有六种。 If you're asking what the core devs would recommend, I can take a guess:
如果您要问核心开发人员会推荐什么,我可以猜测:
If you don't need to capture it asynchronously, use communicate
(but make sure to upgrade to at least 3.2 for important bug fixes).如果您不需要异步捕获它,请使用
communicate
(但请确保至少升级到 3.2 以进行重要的错误修复)。 If you do need to capture it asynchronously, use asyncio
(which requires 3.4).如果确实需要异步捕获它,请使用
asyncio
(需要 3.4)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.