使用python從子進程讀取輸出

Question

上下文

我正在使用subprocess進程模塊從python啟動進程。 我希望能夠在寫入/緩沖后立即訪問輸出（stdout，stderr）。

該解決方案必須支持Windows 7.我也需要Unix系統的解決方案，但我懷疑Windows案例更難以解決。
該解決方案應該支持Python 2.6。 我目前僅限於Python 2.6，但仍然贊賞使用更高版本的Python的解決方案。
解決方案不應使用第三方庫。 理想情況下，我會喜歡使用標准庫的解決方案，但我願意接受建議。
解決方案必須適用於任何流程。 假設無法控制正在執行的進程。

兒童過程

例如，想象一下我想通過subprocess counter.py運行一個名為counter.py的python文件。 counter.py的內容如下：

import sys

for index in range(10):

    # Write data to standard out.
    sys.stdout.write(str(index))

    # Push buffered data to disk.
    sys.stdout.flush()

父流程

負責執行counter.py示例的父進程如下：

import subprocess

command = ['python', 'counter.py']

process = subprocess.Popen(
    cmd,
    bufsize=1,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    )

問題

使用counter.py示例我可以在進程完成之前訪問數據。 這很棒！ 這正是我想要的。 但是，刪除sys.stdout.flush()調用會阻止在我想要的時間訪問數據。 這不好！ 這正是我不想要的。 我的理解是flush()調用強制數據寫入磁盤，在將數據寫入磁盤之前，它只存在於緩沖區中。 請記住，我希望能夠運行任何進程。 我不希望這個過程執行這種刷新，但我仍然期望數據可以實時（或接近它）。 有沒有辦法實現這個目標？

關於父進程的快速說明。 您可能會注意到我正在使用bufsize=0進行行緩沖。 我希望這會導致每行的磁盤刷新，但它似乎不會那樣工作。 這個論點如何運作？

您還會注意到我正在使用subprocess.PIPE 。 這是因為它似乎是在父進程和子進程之間生成IO對象的唯一值。 我通過查看subprocess Popen._get_handles模塊中的Popen._get_handles方法得出了這個結論（我在這里指的是Windows定義）。 有兩個重要的變量， c2pread和c2pwrite ，它們是根據傳遞給Popen構造函數的stdout值設置的。 例如，如果未設置stdout ，則不設置c2pread變量。 使用文件描述符和類文件對象時也是如此。 我真的不知道這是否重要但我的直覺告訴我，我想要讀取和寫入IO對象以實現我想要實現的目標 - 這就是我選擇subprocess.PIPE 。 如果有人能夠更詳細地解釋這一點，我將非常感激。 同樣地，如果有一個令人信服的理由使用比其他的東西subprocess.PIPE我所有的耳朵。

從子進程中檢索數據的方法

import time
import subprocess
import threading
import Queue


class StreamReader(threading.Thread):
    """
    Threaded object used for reading process output stream (stdout, stderr).   
    """

    def __init__(self, stream, queue, *args, **kwargs):
        super(StreamReader, self).__init__(*args, **kwargs)
        self._stream = stream
        self._queue = queue

        # Event used to terminate thread. This way we will have a chance to 
        # tie up loose ends. 
        self._stop = threading.Event()

    def stop(self):
        """
        Stop thread. Call this function to terminate the thread. 
        """
        self._stop.set()

    def stopped(self):
        """
        Check whether the thread has been terminated.
        """
        return self._stop.isSet()

    def run(self):
        while True:
            # Flush buffered data (not sure this actually works?)
            self._stream.flush()

            # Read available data.
            for line in iter(self._stream.readline, b''):
                self._queue.put(line)

            # Breather.
            time.sleep(0.25)

            # Check whether thread has been terminated.
            if self.stopped():
                break


cmd = ['python', 'counter.py']

process = subprocess.Popen(
    cmd,
    bufsize=1,
    stdout=subprocess.PIPE,
    )

stdout_queue = Queue.Queue()
stdout_reader = StreamReader(process.stdout, stdout_queue)
stdout_reader.daemon = True
stdout_reader.start()

# Read standard out of the child process whilst it is active.  
while True:

    # Attempt to read available data.  
    try:
        line = stdout_queue.get(timeout=0.1)
        print '%s' % line

    # If data was not read within time out period. Continue. 
    except Queue.Empty:
        # No data currently available.
        pass

    # Check whether child process is still active.
    if process.poll() != None:

        # Process is no longer active.
        break

# Process is no longer active. Nothing more to read. Stop reader thread.
stdout_reader.stop()

在這里，我執行的邏輯從一個線程中的子進程讀取標准。 這允許在數據可用之前讀取阻塞的情況。 我們不是等待一段可能很長的時間，而是檢查是否有可用的數據，在超時時間內讀取，如果沒有則繼續循環。

我還嘗試了另一種使用一種非阻塞讀取的方法。 此方法使用ctypes模塊來訪問Windows系統調用。 請注意，我並不完全理解我在這里做的事情 - 我只是想了解一些我在其他帖子中看到的示例代碼。 在任何情況下，以下代碼段都無法解決緩沖問題。 我的理解是，它只是另一種打擊潛在的長讀取時間的方法。

import os
import subprocess

import ctypes
import ctypes.wintypes
import msvcrt

cmd = ['python', 'counter.py']

process = subprocess.Popen(
    cmd,
    bufsize=1,
    stdout=subprocess.PIPE,
    )


def read_output_non_blocking(stream):
    data = ''
    available_bytes = 0

    c_read = ctypes.c_ulong()
    c_available = ctypes.c_ulong()
    c_message = ctypes.c_ulong()

    fileno = stream.fileno()
    handle = msvcrt.get_osfhandle(fileno)

    # Read available data.
    buffer_ = None
    bytes_ = 0
    status = ctypes.windll.kernel32.PeekNamedPipe(
        handle,
        buffer_,
        bytes_,
        ctypes.byref(c_read),
        ctypes.byref(c_available),
        ctypes.byref(c_message),
        )

    if status:
        available_bytes = int(c_available.value)

    if available_bytes > 0:
        data = os.read(fileno, available_bytes)
        print data

    return data

while True:

    # Read standard out for child process.
    stdout = read_output_non_blocking(process.stdout)
    print stdout

    # Check whether child process is still active.
    if process.poll() != None:

        # Process is no longer active.
        break

評論非常感謝。

干杯

Answer 1

這里的問題是由子進程緩沖。 你的subprocess的代碼已經作品，以及它可能的，但如果你有一個孩子的過程，它的緩沖輸出，那么沒有什么是subprocess管道可以做到這一點。

我不能強調這一點：你看到的緩沖延遲是子進程的責任，以及如何處理緩沖無關與subprocess模塊。

你已經發現了這個; 這就是為什么在子進程中添加sys.stdout.flush()會使數據更快地顯示出來的原因; 子進程在將其發送到sys.stdout管道¹之前使用緩沖的I / O（內存高速緩存來收集寫入的數據）。

當sys.stdout連接到終端時，Python自動使用行緩沖; 只要寫入換行符，緩沖區就會刷新。 使用管道時， sys.stdout未連接到終端，而是使用固定大小的緩沖區。

現在，可以告訴Python子進程以不同方式處理緩沖; 您可以設置環境變量或使用命令行開關來更改它對sys.stdout （以及sys.stderr和sys.stdin ）使用緩沖的方式。 從Python命令行文檔：

-u
強制stdin，stdout和stderr完全無緩沖。 在重要的系統上，還將stdin，stdout和stderr置於二進制模式。

[...]

PYTHONUNBUFFERED
如果將其設置為非空字符串，則等同於指定-u選項。

如果您正在處理非 Python進程的子進程並且遇到緩存問題，那么您需要查看這些進程的文檔，看看它們是否可以切換為使用無緩沖的I / O，或者切換到更理想的緩沖策略。

您可以嘗試的一件事是使用script -c命令為子進程提供偽終端。 但是，這是一個POSIX工具，可能在Windows上不可用。

^1. 應該注意的是，當沖洗管道時，沒有數據被“寫入磁盤”; 所有數據都保留在內存中。 I / O緩沖區只是內存緩存，通過處理更大的塊中的數據來從I / O中獲得最佳性能。 只有當你有一個基於磁盤的文件對象時， fileobj.flush()才會將任何緩沖區推送到操作系統，這通常意味着數據確實被寫入磁盤。

Answer 2

expect有一個名為'unbuffer'的命令：

http://expect.sourceforge.net/example/unbuffer.man.html

這將禁用任何命令的緩沖

使用python從子進程讀取輸出

問題描述

上下文

兒童過程

父流程

問題

從子進程中檢索數據的方法

2 個解決方案

解決方案1
10 2014-01-23 12:08:13

解決方案2
2 2014-01-26 00:29:22

使用python從子進程讀取輸出

問題描述

上下文

兒童過程

父流程

問題

從子進程中檢索數據的方法

2 個解決方案

解決方案1 10 2014-01-23 12:08:13

解決方案2 2 2014-01-26 00:29:22

解決方案1
10 2014-01-23 12:08:13

解決方案2
2 2014-01-26 00:29:22