简体   繁体   English

通过管道传递大型数组时,Python多处理卡住

[英]Python multiprocessing gets stuck when passing large array through pipe

I'm using multiprocessing in python and try to pass a large numpy array to a subprocess though a pipe. 我在python中使用多重处理,并尝试通过管道将大型numpy数组传递给子进程。 It works well with a small array but hangs for larger arrays without returning an error. 它适用于较小的数组,但可挂在较大的数组上,而不会返回错误。

I believe that the pipe is blocked and already read a bit about it but cannot figure out how to solve the problem. 我相信该管道已被阻塞,并且已经对其进行了一些阅读,但无法弄清楚如何解决该问题。

def f2(conn, x):
    conn.start()
    data = conn.recv()
    conn.join()

    print(data)
    do_something(x)

    conn.close()

if __name__ == '__main__':
    data_input = read_data()    # large numpy array
    parent_conn, child_conn = Pipe()

    p = multiprocessing.Pool(processes=8)      
    func = partial(f2, child_conn)

    parent_conn.send(data_input)
    parent_conn.close()

    result = p.map(func, processes)

    p.close()
    p.join()

Ignoring all the other problems in this code (you don't have an x to pass to map , you don't use the x f2 receives, mixing Pool.map with Pipe is usually the wrong thing to do), your ultimate problem is the blocking send call being performed before a worker process is available to read from it. 忽略此代码中的所有其他问题(您没有将x传递给map ,不使用x f2接收,将Pool.mapPipe混合通常是错误的做法),最终的问题是阻止send调用工作进程可以从中读取之前执行。

Assuming you really want to mix map with Pipe , the solution is to launch the map asynchronously before beginning the send , so there is something on the other side to read from the Pipe while the parent is trying to write to it: 假设您确实想将mapPipe混合使用,解决方案是开始send 之前异步启动map ,因此在父级尝试向其写入数据的同时,还可以从Pipe读取另一面的内容:

if __name__ == '__main__':
    data_input = read_data()    # large numpy array
    parent_conn, child_conn = Pipe()

    # Use with to avoid needing to explicitly close/join
    with multiprocessing.Pool(processes=8) as p:
        func = partial(f2, child_conn)

        # Launch async map to ensure workers are running
        future = p.map_async(func, x)

        # Can perform blocking send as workers will consume as you send
        parent_conn.send(data_input)
        parent_conn.close()

        # Now you can wait on the map to complete
        result = future.get()

As noted, this code will not run due to the issues with x , and even if it did, the Pipe documentation explicitly warns that two different processes should not be reading from the Pipe at the same time. 如前所述,由于x的问题, 该代码将无法运行 ,并且即使发生问题, Pipe文档也明确警告不要同时从Pipe读取两个不同的进程。

If you wanted to process the data in bulk in a single worker, you'd just use Process and Pipe , something like: 如果要在一个工作器中批量处理数据,则只需使用ProcessPipe ,例如:

def f2(conn):
    data = conn.recv()
    conn.close()
    print(data)

if __name__ == '__main__':
    parent_conn, child_conn = Pipe()

    proc = multiprocessing.Process(target=f2, args=(child_conn,))
    proc.start()

    data_input = read_data()    # large numpy array
    parent_conn.send(data_input)
    parent_conn.close()

    proc.join()

If you wanted to process each element separately across many workers, you'd just use Pool and map : 如果您想跨多个工作人员分别处理每个元素,则只需使用Poolmap

def f2(x):
    print(x)

if __name__ == '__main__':
    data_input = read_data()    # large numpy array
    with multiprocessing.Pool(processes=8) as p:   
        result = p.map(f2, data_input)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM