简体   繁体   English

Python:从多个子进程异步打印stdout

[英]Python: asynhronously print stdout from multiple subprocesses

I'm testing out a way to print out stdout from several subprocesses in Python 2.7. 我正在测试一种从Python 2.7中的几个子进程打印出stdout的方法。 What I have setup is a main process that spawns, at the moment, three subprocesses and spits out their output. 我所设置的是一个主要过程,目前产生三个子过程并吐出它们的输出。 Each subprocess is a for-loop that goes to sleep for some random amount of time, and when it wakes up, says "Slept for X seconds". 每个子进程都是一个for循环,它会在一段随机的时间内进入休眠状态,当它唤醒时,会说“睡眠时间为X秒”。

The problem I'm seeing is that the printing out seems synchronous. 我看到的问题是打印出来似乎是同步的。 Say subprocess A sleeps for 1 second, subprocess B sleeps for 3 seconds, and subprocess C sleeps for 10 seconds. 假设子进程A休眠1秒,子进程B休眠3秒,子进程C休眠10秒。 The main process stops for the full 10 seconds when it's trying to see if subprocess C has something, even though the other two have probably slept and printed something out. 当它试图查看子进程C是否有某些东西时,主进程会停止整整10秒,即使其他两个可能已经睡眠并打印出来。 This is to simulate if a subprocess truly has nothing to output for a longer period of time than the other two. 这是为了模拟子进程是否真的没有比其他两个更长的时间输出。

I need a solution which works on Windows. 我需要一个适用于Windows的解决方案。

My code is as follows: 我的代码如下:

main_process.py main_process.py

import sys
import subprocess

logfile = open('logfile.txt', 'w')
processes = [
            subprocess.Popen('python subproc_1.py', stdout=subprocess.PIPE, bufsize=1), 
            subprocess.Popen('python subproc_2.py', stdout=subprocess.PIPE, bufsize=1), 
            subprocess.Popen('python subproc_3.py', stdout=subprocess.PIPE, bufsize=1), 
        ]


while True:
    line = processes[0].stdout.readline() 
    if line != '':
        sys.stdout.write(line)
        logfile.write(line)

    line = processes[1].stdout.readline()
    if line != '':
        sys.stdout.write(line)
        logfile.write(line)

    line = processes[2].stdout.readline()
    if line != '':
        sys.stdout.write(line)
        logfile.write(line)

    #If everyone is dead, break
    if processes[0].poll() is not None and \
       processes[1].poll() is not None and \
       processes[2].poll() is not None:
        break

processes[0].wait()
processes[1].wait()

print 'Done'

subproc_1.py/subproc_2.py/subproc_3.py subproc_1.py/subproc_2.py/subproc_3.py

import time, sys, random

sleep_time = random.random() * 3
for x in range(0, 20):
    print "[PROC1] Slept for {0} seconds".format(sleep_time)
    sys.stdout.flush()
    time.sleep(sleep_time)
    sleep_time = random.random() * 3 #this is different for each subprocess.

Update: Solution 更新:解决方案

Taking the answer below along with this question , this is this should work. 下面的答案和这个问题一起 ,这是应该的。

import sys
import subprocess
from threading import Thread

try:
    from Queue import Queue, Empty
except ImportError:
    from queue import Queue, Empty # for Python 3.x

ON_POSIX = 'posix' in sys.builtin_module_names

def enqueue_output(out, queue):
    for line in iter(out.readline, b''):
        queue.put(line)
    out.close()

if __name__ == '__main__':
    logfile = open('logfile.txt', 'w')
    processes = [
                subprocess.Popen('python subproc_1.py', stdout=subprocess.PIPE, bufsize=1), 
                subprocess.Popen('python subproc_2.py', stdout=subprocess.PIPE, bufsize=1), 
                subprocess.Popen('python subproc_3.py', stdout=subprocess.PIPE, bufsize=1), 
            ]
    q = Queue()
    threads = []
    for p in processes:
        threads.append(Thread(target=enqueue_output, args=(p.stdout, q)))

    for t in threads:
        t.daemon = True
        t.start()

    while True:
        try:
            line = q.get_nowait()
        except Empty:
            pass
        else:
            sys.stdout.write(line)
            logfile.write(line)
            logfile.flush()

        #break when all processes are done.
        if all(p.poll() is not None for p in processes):
            break

    print 'All processes done'

I'm not sure if I need any cleanup code at the end of the while loop. 我不确定在while循环结束时是否需要任何清理代码。 If anyone has comments about it, please add them. 如果有人对此有任何意见,请添加它们。

And each subproc script looks similar to this (I edited for the sake of making a better example): 每个subproc脚本看起来都与此类似(为了更好的例子我编辑了):

import datetime, time, sys, random

for x in range(0, 20):
    sleep_time = random.random() * 3
    time.sleep(sleep_time)
    timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%H%M%S.%f')
    print "[{0}][PROC1] Slept for {1} seconds".format(timestamp, sleep_time)
    sys.stdout.flush()

print "[{0}][PROC1] Done".format(timestamp)
sys.stdout.flush()

Your problem comes from the fact that readline() is a blocking function; 你的问题来自readline()是一个阻塞函数; if you call it on a file object and there isn't a line waiting to be read, the call won't return until there is a line of output. 如果你在文件对象上调用它并且没有等待读取的行,则在有一行输出之前不会返回调用。 So what you have now will read repeatedly from subprocesses 1, 2, and 3 in that order , pausing at each until output is ready. 因此,您现在拥有的内容将按顺序从子进程1,2和3 重复读取,暂停,直到输出准备就绪。

( Edit: The OP clarified that they're on Windows, which makes the below inapplicable. ) 编辑: OP澄清他们在Windows上,这使得以下不适用。)

If you want to read from whichever output stream is ready, you need to check on the status of the streams in non-blocking fashion, using the select module, and then attempt reads only on those that are ready. 如果要从准备好的输出流中读取,则需要使用select模块以非阻塞方式检查流的状态,然后尝试仅读取已准备好的流。 select provides various ways of doing this, but for the sake of example we'll use select.select() . select提供了各种方法,但为了示例,我们将使用select.select() After starting your subprocesses, you'll have something like: 启动子流程后,您将拥有以下内容:

streams = [p.stdout for p in processes]

def output(s):
    for f in [sys.stdout, logfile]:
        f.write(s)
        f.flush()

while True:
    rstreams, _, _ = select.select(streams, [], [])
    for stream in rstreams:
        line = stream.readline()
        output(line)
    if all(p.poll() is not None for p in processes):
        break

for stream in streams:
    output(stream.read())

What select() does, when called with three lists of file objects (or file descriptors), is return three subsets of its arguments, which are the streams that are ready for reading, are ready for writing, or have an error condition. 当使用三个文件对象(或文件描述符)列表调用时, select()作用是返回其参数的三个子集,这些子集是准备好读取的流,可以写入,或者具有错误条件。 Thus on each iteration of the loop we check to see which output streams are ready to read, and iterate over just those. 因此,在循环的每次迭代中,我们检查哪些输出流已准备好读取,并迭代这些输出流。 Then we repeat. 然后我们重复一遍 (Note that it's important here that you're line-buffering the output; the above code assumes that if a stream is ready for reading there's at least one full line ready to be read. If you specify different buffering the above can block.) (注意,在这里重要的是你对输出进行行缓冲;上面的代码假设如果一个流已准备好读取,那么至少有一个完整的行准备好被读取。如果你指定了不同的缓冲,则上面的可以阻塞。)

A further problem with your original code: When you exit the loop after poll() reports all subprocesses to have exited, you might not have read all their output. 原始代码的另一个问题:当poll()报告所有已退出的子进程后退出循环时,您可能没有读取其所有输出。 So you need to do a last sweep over the streams to read any remaining output. 因此,您需要对流进行最后一次扫描以读取任何剩余输出。

Note: The example code I gave doesn't try all that hard to capture the subprocesses' output in exactly the order in which it becomes available (which is impossible to do perfectly, but can be approximated more closely than the above manages to do). 注意:我给出的示例代码并没有尽可能地完全按照它的可用顺序捕获子进程的输出(这是不可能完美的,但可以比上面的管理要做的更近似近似) 。 It also lacks other refinements (for example, in the main loop it'll continue to select on the stdout of every subprocess, even after some have already terminated, which is harmless, but inefficient). 它也没有其他改进(例如,在主循环中它将继续在每个子进程的stdout上选择,即使在一些已经终止之后,这是无害的,但效率低下)。 It's just meant to illustrate a basic technique of non-blocking IO. 这只是为了说明非阻塞IO的基本技术。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM