简体   繁体   English

如何避免将文件描述符从 multiprocessing.Pipe 泄漏到后续子进程?

[英]How to avoid leaking file descriptors from multiprocessing.Pipe to subsequent child processes?

To take advantage of several CPU cores in a Python program, I am using the multiprocessing module and sending data via its Pipe class. But when the main program closes the sending end, the child processes are blocking on recv() instead of raising an EOFError exception.为了利用 Python 程序中的多个 CPU 核心,我使用multiprocessing模块并通过其Pipe class 发送数据。但是当主程序关闭发送端时,子进程阻塞在recv()而不是EOFError例外。 This is caused by open file descriptors, which need to be closed in the other process context first, as described in these (and other) answers:这是由打开的文件描述符引起的,需要首先在其他进程上下文中关闭,如这些(和其他)答案中所述:

Why doesn't pipe.close() cause EOFError during pipe.recv() in python multiprocessing? 为什么 pipe.close() 在 python 多处理中的 pipe.recv() 期间不会导致 EOFError?

Python multiprocessing pipe recv() doc unclear or did I miss anything? Python 多处理 pipe recv() 文档不清楚还是我遗漏了什么?

My problem is that when consecutively creating two Processes with Pipes, the second one inherits the remaining, "parent" end file descriptor of the first one's Pipe. So closing the first Pipe will lead to hanging instead of EOFError again, even though each Pipe's unused ends were closed as recommended.我的问题是,当连续创建两个带有管道的进程时,第二个继承了第一个 Pipe 的剩余“父”结束文件描述符。因此关闭第一个 Pipe 将导致挂起而不是EOFError ,即使每个管道都未使用按照建议关闭末端。

This code illustrates the problem, Linux only:此代码说明了问题,仅限 Linux:

import os
import time
import multiprocessing as mp
import subprocess


class MeasurementWriter:

    def __init__(self, name):
        self.name = name
        self.parent_conn = None
        self.worker = None

    def open(self):
        conn_pair = mp.Pipe()
        self.worker = mp.Process(target=self.run, name=self.name, args=(conn_pair,))
        self.worker.start()
        self.parent_conn, child_conn = conn_pair
        print('pid %d started %d; fds: %d %d'
              % (os.getpid(), self.worker.pid,
                 self.parent_conn.fileno(), child_conn.fileno()))
        # Close the other end, as it is not needed in our process context
        child_conn.close()
        subprocess.call(["ls", "-l", "/proc/%d/fd" % os.getpid()])

    def close(self):
        if self.parent_conn is None:
            print('not open')
            return
        print('closing pipe', self.parent_conn.fileno())
        self.parent_conn.close()
        print('joining worker')
        self.worker.join()  # HANGS if more than one mp.Process has been started!

    def run(self, conn_pair):
        parent_conn, conn = conn_pair
        print('%s pid %d started; fds: %d %d'
              % (self.name, os.getpid(), parent_conn.fileno(), conn.fileno()))
        # Close the other end, as it is not needed in our process context
        parent_conn.close()
        time.sleep(0.5)
        print(self.name, 'parent_conn.closed =', parent_conn.closed)
        subprocess.call(["ls", "-l", "/proc/%d/fd" % os.getpid()])
        try:
            print(self.name, 'recv blocking...')
            data = conn.recv()
            print(self.name, 'recv', data)
        except EOFError:
            print(self.name, 'EOF')
        conn.close()


if __name__ == '__main__':
    a = MeasurementWriter('A')
    a.open()

    # Increase fd numbers to make them recognizable
    n = open('/dev/null')
    z = open('/dev/zero')
    # Wait for debug printing to complete
    time.sleep(1)

    b = MeasurementWriter('B')
    b.open()  # Uncomment to see clean exit

    time.sleep(2)

    # Clean up
    a.close()  # HANGS: The parent_conn fd is still open in the second Process
    b.close()

The output is as follows (some uninteresting fd lines omitted). output如下(省略了一些无趣的fd行)。 Tested with Python 3.5 and 3.8.10 under Linux:在 Linux 下使用 Python 3.5 和 3.8.10 测试:

pid 592770 started 592771; fds: 3 4
A pid 592771 started; fds: 3 4
total 0
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 3 -> 'socket:[8294651]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 5 -> 'pipe:[8294653]'
A parent_conn.closed = True
total 0
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 4 -> 'socket:[8294652]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 5 -> /dev/null
l-wx------ 1 acolomb acolomb 64 Mar 18 19:02 6 -> 'pipe:[8294653]'
A recv blocking...
pid 592770 started 592774; fds: 7 8
B pid 592774 started; fds: 7 8
total 0
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 3 -> 'socket:[8294651]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 4 -> /dev/null
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 5 -> 'pipe:[8294653]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 6 -> /dev/zero
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 7 -> 'socket:[8294672]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 9 -> 'pipe:[8294674]'
B parent_conn.closed = True
total 0
l-wx------ 1 acolomb acolomb 64 Mar 18 19:02 10 -> 'pipe:[8294674]'
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 3 -> 'socket:[8294651]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 4 -> /dev/null
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 5 -> 'pipe:[8294653]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 6 -> /dev/zero
lrwx------ 1 acolomb acolomb 64 Mar 18 19:02 8 -> 'socket:[8294673]'
lr-x------ 1 acolomb acolomb 64 Mar 18 19:02 9 -> /dev/null
B recv blocking...
closing pipe 3
joining worker

We can see that the youngest process (B) has inherited fd number 3 that belongs to A's Pipe on the parent end.我们可以看到,最年轻的进程(B)在父端继承了属于A的Pipe的3号fd。 Therefore closing it will not lead to terminating A's process, as it is still referenced.因此关闭它不会导致 A 的进程终止,因为它仍然被引用。 How can I avoid subsequent child processes inheriting the file descriptors of another child's Pipe objects?如何避免后续子进程继承另一个子进程的Pipe对象的文件描述符?

For this simple example, switching the order of the .close() calls would probably help, but in reality they may be started in random order based on user interactions.对于这个简单的示例,切换.close()调用的顺序可能会有所帮助,但实际上它们可能会根据用户交互以随机顺序启动。 The intended use is to write several output streams (one MeasurementWriter instance for each) with transparent compression being handled in an associated child process, to not block the main process regularly.预期用途是编写多个 output 流(每个流一个MeasurementWriter实例),并在关联的子进程中处理透明压缩,而不是定期阻塞主进程。

One suggestion I found at https://microeducate.tech/using-python-multiprocessing-pipes/ keeps track of all pipe ends in the parent process using a list, then closing all unrelated ones in newly created child processes.我在https://microeducate.tech/using-python-multiprocessing-pipes/找到的一个建议使用列表跟踪父进程中的所有pipe 结束,然后关闭新创建的子进程中所有不相关的结束。 But I have no good place for such a "manager", as these objects come and go during the app lifetime.但是我没有适合这样的“经理”的地方,因为这些对象在应用程序生命周期内出现并且 go。

In a real-life situation one process would probably be in a loop doing recv calls on its connection.在现实生活中,一个进程可能会在一个循环中对其连接进行recv调用。 Since we see that getting an EOFError exception is undependable when the connection is closed on the other end, the simplest solution is for the sending end to signal "end of file" by issuing a send call on the connection with a special sentinel item that cannot be mistaken for a normal data item.由于我们看到在另一端关闭连接时获得EOFError异常是不可靠的,因此最简单的解决方案是发送端通过使用无法发送的特殊哨兵项在连接上发出send调用来发出“文件结束”信号被误认为是正常的数据项。 None is often is suitable for that purpose.通常None一个适合该目的。

So modify method method close to be:所以修改 method method close为:

    def close(self):
        if self.parent_conn is None:
            print('not open')
            return
        print('closing pipe', self.parent_conn.fileno())
        self.parent_conn.send(None) # Sentinel
        self.parent_conn.close()
        print('joining worker')
        self.worker.join()  # HANGS if more than one mp.Process has been started!

And a more realistic run method might be:更现实的run方法可能是:

    def run(self, conn_pair):
        parent_conn, conn = conn_pair
        print('%s pid %d started; fds: %d %d'
              % (self.name, os.getpid(), parent_conn.fileno(), conn.fileno()))
        # Close the other end, as it is not needed in our process context
        parent_conn.close()
        time.sleep(0.5)
        print(self.name, 'parent_conn.closed =', parent_conn.closed)
        subprocess.call(["ls", "-l", "/proc/%d/fd" % os.getpid()])
        try:
            while True:
                print(self.name, 'recv blocking...')
                data = conn.recv()
                if data is None: # Sentinel?
                    break
                print(self.name, 'recv', data)
        except EOFError:
            print(self.name, 'EOF')
        conn.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM