简体   繁体   English

为什么 `popen.stdout.readline` 会出现死锁以及如何处理?

[英]Why can `popen.stdout.readline` deadlock and what to do about it?

From the Python documentation来自Python 文档

Warning Use communicate() rather than .stdin.write , .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.警告 使用communicate()而不是.stdin.write.stdout.read.stderr.read以避免由于任何其他OS pipe 缓冲区填满并阻塞子进程而导致的死锁。

I'm trying to understand why this would deadlock.我试图理解为什么这会陷入僵局。 For some background, I am spawning N processes in parallel:对于某些背景,我正在并行生成 N 个进程:

for c in commands:
    h = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
    handles.append(h)

Then printing the output of each process 1-by-1:然后逐一打印每个进程的 output:

for handle in handles:
    while handle.poll() is None:
        try:
            line = handle.stdout.readline()
        except UnicodeDecodeError:
            line = "((INVALID UNICODE))\n"

        sys.stdout.write(line)
    if handle.returncode != 0:
        print(handle.stdout.read(), file=sys.stdout)
    if handle.returncode != 0:
        print(handle.stderr.read(), file=sys.stderr)

Occasionally this does in fact deadlock.有时,这实际上会陷入僵局。 Unfortunately, the documentation's recommendation to use communicate() is not going to work for me, because this process could take several minutes to run, and I don't want it to appear dead during this time.不幸的是,文档中使用communicate()的建议对我不起作用,因为这个过程可能需要几分钟才能运行,而且我不希望它在这段时间内显得死机。 It should print output in real time.它应该实时打印 output。

I have several options, such as changing the bufsize argument, polling in a different thread for each handle, etc. But in order to decide what the best way to fix this, I think I need to understand what the fundamental reason for the deadlock is in the first place.我有几个选择,例如更改bufsize参数,在不同的线程中轮询每个句柄等。但是为了确定解决此问题的最佳方法,我想我需要了解死锁的根本原因是什么首先。 Something to do with buffer sizes, apparently, but what?显然,与缓冲区大小有关,但是什么? I can hypothesize that maybe all of these processes are sharing a single OS kernel object, and because I'm only draining the buffer of one of the processes, the other ones fill it up, in which case option 2 above would probably fix it.我可以假设,也许所有这些进程都共享一个操作系统 kernel object,并且因为我只排空了其中一个进程的缓冲区,其他进程将其填满,在这种情况下,上面的选项 2 可能会修复它。 But maybe that's not even the real problem.但也许这甚至不是真正的问题。

Can anyone shed some light on this?任何人都可以对此有所了解吗?

The bidirectional communication between the parent and child processes uses two unidirectional pipes.父进程和子进程之间的双向通信使用两个单向管道。 One for each direction.每个方向一个。 OK, stderr is the third one, but the idea is the same. OK,stderr是第三个,但是思路是一样的。

A pipe has two ends, one for writing, one for reading.一个pipe有两端,一端写,一端读。 The capacity of a pipe was 4K and is now 64K on modern Linux. pipe 的容量为 4K,现在在现代 Linux 上为 64K。 One can expect similar values on other systems.人们可以在其他系统上期望类似的值。 This means, the writer can write to a pipe without problems up to its limit, but then the pipe gets full and a write to it blocks until the reader reads some data from the other end.这意味着,写入器可以毫无问题地写入 pipe,但随后 pipe 已满,并且写入阻塞,直到读取器从另一端读取一些数据。

From the reader's view is the situation obvious.从读者的角度来看,情况是显而易见的。 A regular read blocks until data is available.定期读取会阻塞,直到数据可用。

To summarize: a deadlock occurs when a process attempts to read from a pipe where nobody is writing to or when it writes data larger that the pipe's capacity to a pipe nobody is reading from.总结一下:当一个进程尝试从没有人写入的 pipe 中读取数据时,或者当它向没有人读取的 pipe 写入大于管道容量的数据时,就会发生死锁。

Typically the two processes act as a client & server and utilize some kind of request/response style communication.通常,这两个进程充当客户端和服务器,并利用某种请求/响应式通信。 Something like half-duplex.像半双工的东西。 One side is writing and the other one is reading.一边写,一边读。 Then they switch the roles.然后他们转换角色。 This is practically the most complex setup we can handle with standard synchronous programming.这实际上是我们可以使用标准同步编程处理的最复杂的设置。 And a deadlock can stil occur when the client and server get somehow out of sync.当客户端和服务器以某种方式不同步时,仍然会发生死锁。 This can be caused by an empty response, unexpected error message, etc.这可能是由空响应、意外错误消息等引起的。

If there are several child processes or when the communication protocol is not so simple or we just want a robust solution, we need the parent to operate on all the pipes.如果有多个子进程或者当通信协议不是那么简单或者我们只是想要一个健壮的解决方案时,我们需要父进程对所有管道进行操作。 communicate() uses threads for this purpose. communicate()为此目的使用线程。 The other approach is asynchronous I/O: first check, which one is ready to do I/O and only then read or write from that pipe (or socket).另一种方法是异步 I/O:首先检查哪个已准备好执行 I/O,然后才从该 pipe(或套接字)读取或写入。 The old and deprecated asyncore library implemented that.旧的和已弃用的asyncore库实现了这一点。

On the low level, the select (or similar) system call checks which file handles from a given set are ready for I/O.在底层,select(或类似的)系统调用检查给定集合中的哪些文件句柄已准备好进行 I/O。 But at that low level, we can do only one read or write before re-checking .但是在那个低级别,我们只能在重新检查之前进行一次读取或写入 That is the problem of this snippet:这就是这个片段的问题:

while handle.poll() is None:
    try:
        line = handle.stdout.readline()
    except UnicodeDecodeError:
        line = "((INVALID UNICODE))\n"

The poll check tells us there is something to be read, but this does not mean we will be able to read repeatedly until a newline. poll检查告诉我们有一些东西要读取,但这并不意味着我们将能够重复读取直到换行。 We can only do one read and append the data to an input buffer, If there is a newline.我们只能做一次读取和 append 的数据到输入缓冲区,如果有换行符。 we can extract the whole line and process it, If not.如果没有,我们可以提取整行并处理它。 we need to wait to next succesfull poll and read.我们需要等待下一次成功的投票并阅读。

Writes behave similarly.写入行为类似。 We can write once, check the number of bytes written and remove that many bytes from the output buffer.我们可以写入一次,检查写入的字节数,然后从 output 缓冲区中删除那么多字节。

That implies that line buffering and all that higher level stuff needs to be implemented on top of that.这意味着需要在此之上实现行缓冲和所有更高级别的东西。 Fortunately, the successor of asyncore offers what we need: asyncio > subprocesses .幸运的是, asyncore的继承者提供了我们需要的东西: asyncio > subprocesses

I hope I could explain the deadlock.我希望我能解释一下僵局。 The solution could be expected.可以预期解决方案。 If you need to do several things, use either threading or asyncio .如果您需要做几件事,请使用 threading 或asyncio


UPDATE:更新:

Below is a short asyncio test program.下面是一个简短的 asyncio 测试程序。 It reads inputs from several child processes and prints the data line by line.它从几个子进程读取输入并逐行打印数据。

But first a cmd.py helper which prints a line in several small chunks to demonstrate the line buffering.但首先是一个cmd.py助手,它以几个小块打印一行以演示行缓冲。 Try the usage eg with python3 cmd.py 10 .尝试使用例如python3 cmd.py 10

import sys
import time

def countdown(n):
    print('START', n)
    while n >= 0: 
        print(n, end=' ', flush=True)
        time.sleep(0.1)
        n -= 1
    print('END')

if __name__ == '__main__':
    args = sys.argv[1:]
    if len(args) != 1:
        sys.exit(3)
    countdown(int(args[0]))

And the main program:和主程序:

import asyncio

PROG = 'cmd.py'
NPROC = 12

async def run1(*execv):
    """Run a program, read input lines."""
    proc = await asyncio.create_subprocess_exec(
        *execv,
        stdin=asyncio.subprocess.DEVNULL,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.DEVNULL)
    # proc.stdout is a StreamReader object
    async for line in proc.stdout:
        print("Got line:", line.decode().strip())

async def manager(prog, nproc):
    """Spawn 'nproc' copies of python script 'prog'."""
    tasks = [asyncio.create_task(run1('python3', prog, str(i))) for i in range(nproc)]
    await asyncio.wait(tasks)

if __name__ == '__main__':
    asyncio.run(manager(PROG, NPROC))

The async for line... is a feature of StreamReader similar to the for line in file: idiom. async for line...StreamReader的一个功能,类似于for line in file: idiom. It can be replaced it with:可以将其替换为:

    while True:
        line = await proc.stdout.readline()
        if not line:
            break
        print("Got line:", line.decode().strip())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM