简体   繁体   English

使用 python 从进程中捕获输出的最佳方法是什么?

[英]What is the best way to capture output from a process using python?

I am using python's subprocess module to start a new process.我正在使用 python 的subprocess模块来启动一个新进程。 I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.).我想实时捕获新进程的输出,以便我可以用它做一些事情(显示它,解析它等)。 I have seen many examples of how this can be done, some use custom file-like objects, some use threading and some attempt to read the output until the process has completed.我已经看到了很多关于如何做到这一点的例子,有些使用自定义的类文件对象,有些使用threading ,有些尝试读取输出直到过程完成。

File Like Objects Example (click me) 文件类对象示例(单击我)

  • I would prefer not to use custom file-like objects because I want to allow users to supply their own values for stdin , stdout and stderr .我不想使用自定义的类文件对象,因为我希望允许用户为stdinstdoutstderr提供他们自己的值。

Threading Example (click me) 线程示例(点我)

  • I do not really understand why threading is required so I am reluctant to follow this example.我真的不明白为什么需要线程,所以我不愿意遵循这个例子。 If someone can explain why the threading example makes sense I would be happy listen.如果有人能解释为什么线程示例有意义,我会很高兴听。 However, this example also restricts users from supplying their own stdout and stderr values.但是,此示例还限制用户提供他们自己的stdoutstderr值。

Read Output Example (see below)读取输出示例(见下文)

The example which makes the most sense to me is to read the stdout , stderr until the process has finished.对我来说最有意义的示例是读取stdoutstderr直到该过程完成。 Here is some example code:下面是一些示例代码:

import subprocess

# Start a process which prints the options to the python program.
process = subprocess.Popen(
    ["python", "-h"],
    bufsize=1,
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
)    

# While the process is running, display the output to the user.
while True:

    # Read standard output data.
    for stdout_line in iter(process.stdout.readline, ""):

        # Display standard output data.
        sys.stdout.write(stdout_line)

    # Read standard error data.
    for stderr_line in iter(process.stderr.readline, ""):

        # Display standard error data.
        sys.stderr.write(stderr_line)

    # If the process is complete - exit loop.
    if process.poll() != None:
        break

My question is,我的问题是,

Q. Is there a recommended approach for capturing the output of a process using python?问:是否有使用 python 捕获进程输出的推荐方法?

First, your design is a bit silly, since you can do the same thing like this:首先,你的设计有点傻,因为你可以做同样的事情:

process = subprocess.Popen(
                           ["python", "-h"],
                           bufsize=1,
                           stdout=sys.stdout,
                           stderr=sys.stderr
                           )

… or, even better: ……或者,甚至更好:

process = subprocess.Popen(
                           ["python", "-h"],
                           bufsize=1
                           )

However, I'll assume that's just a toy example, and you might want to do something more useful.但是,我假设这只是一个玩具示例,您可能想要做一些更有用的事情。


The main problem with your design is that it won't read anything from stderr until stdout is done.您的设计的主要问题是在完成stdout之前它不会从stderr读取任何内容。

Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs.想象一下,您正在驾驶一个 MP3 播放器,将每个曲目名称打印到 stdout,并将信息记录到 stderr,并且您想要播放 10 首歌曲。 Do you really want to wait 30 minutes before displaying any of the logging to your users?在向用户显示任何日志记录之前,您真的要等待 30 分钟吗?

If that is acceptable, then you might as well just use communicate , which takes care of all of the headaches for you.如果这可以接受的,那么您不妨使用communicate ,它会为您解决所有头痛的问题。

Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child?另外,即使您的模型可以接受,您确定可以在管道中排队那么多未读数据而不会阻塞孩子吗? On every platform?在每个平台上?

Just breaking up the loop to alternate between the two won't help, because you could end up blocking on stdout.readline() for 5 minutes while stderr is piling up.只是中断循环以在两者之间交替将无济于事,因为当stderr堆积时,您最终可能会在stdout.readline()上阻塞 5 分钟。

So that's why you need some way to read from both at once.所以这就是为什么您需要某种方式同时读取两者的原因。


How do you read from two pipes at once?你如何一次读取两个管道?

This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.).这是与一次处理 1000 个网络客户端相同的问题(但更小),并且具有相同的解决方案:线程化或多路复用(以及各种混合,例如在多路复用器和事件循环之上执行绿色线程,或使用螺纹前摄器等)。

The best sample code for the threaded version is communicate from the 3.2+ source code.线程版本的最佳示例代码是来自 3.2+ 源代码的communicate It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity.这有点复杂,但是如果您想在 Windows 和 Unix 上正确处理所有边缘情况,那么确实无法避免一点复杂性。

For multiplexing, you can use the select module, but keep in mind that this only works on Unix (you can't select on pipes on Windows), and it's buggy without 3.2+ (or the subprocess32 backport), and to really get all the edge cases right you need to add a signal handler to your select .对于复用,您可以使用select的模块,但请记住,这仅适用于Unix(你不能select在Windows上的管道),它的越野车没有3.2+(或subprocess32反向移植),并且真正得到所有在边缘情况下,您需要向select添加一个信号处理程序。 Unless you really, really don't want to use threading, this is the harder answer.除非你真的,真的不想使用线程,否则这是更难的答案。

But the easy answer is to use someone else's implementation.简单的答案是使用其他人的实现。 There are a dozen or more modules on PyPI specifically for async subprocesses. PyPI 上有十几个或更多模块专门用于异步子进程。 Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's asyncio ) includes subprocess support out of the box, that works on both Unix and Windows.或者,如果您已经有充分的理由围绕事件循环编写应用程序,那么几乎每个现代事件循环驱动的异步网络库(包括 stdlib 的asyncio )都包含开箱即用的子asyncio支持,适用于 Unix 和视窗。


Is there a recommended approach for capturing the output of a process using python?是否有推荐的方法来使用 python 捕获进程的输出?

It depends on who you're asking;这取决于你问的是谁; a thousand Python developers might have a thousand different answers… or at least half a dozen.一千名 Python 开发人员可能会有一千种不同的答案……或者至少有六种。 If you're asking what the core devs would recommend, I can take a guess:如果您要问核心开发人员会推荐什么,我可以猜测:

If you don't need to capture it asynchronously, use communicate (but make sure to upgrade to at least 3.2 for important bug fixes).如果您不需要异步捕获它,请使用communicate (但请确保至少升级到 3.2 以进行重要的错误修复)。 If you do need to capture it asynchronously, use asyncio (which requires 3.4).如果确实需要异步捕获它,请使用asyncio (需要 3.4)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从Python的连续过程中捕获输出? - How to capture output from continuous process in Python? 在Python中处理HTML的最佳方法是什么? - What's the best way to process HTML in Python? 监控进程并使用python捕获输出 - Monitor a process and capture output with python 有没有办法知道使用python的chromedrive有什么chrome进程? - Is there a way to know what chrome process comes from chromedrive using python? 使用Python保存从Facebook收集的评论的最佳方法是什么? - What is the best way to save the comments collected from Facebook using Python? 使用python和xlrd,从电子表格中读取2列的最佳方法是什么 - Using python and xlrd, what is the best way to read 2 columns from a spreadsheet 用Python处理大量网络数据包的最佳方法是什么? - What is the best way to process large amount of network packets in Python? 查看python进程是否已在运行的最佳方法是什么? - What is the best way to see if a python process is already running? 有没有办法在 Python 3 中从 subprocess.run 流式传输和捕获输出? - Is there a way to both stream and capture output from subprocess.run in Python 3? 从脚本中发送python控制台输出作为电子邮件的最佳方法是什么? - What is the best way to send python console output as email from within the script?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM