Python (3.7+) 多处理：用 asyncio 替换 master 和 worker 之间的管道连接以实现 IO 并发

Question

Suppose we have a following toy version of master-worker pipeline to parallel data collection假设我们有以下玩具版本的 master-worker 管道来并行数据收集

# pip install gym
import gym
import numpy as np
from multiprocessing import Process, Pipe

def worker(master_conn, worker_conn):
    master_conn.close()

    env = gym.make('Pendulum-v0')
    env.reset()

    while True:
        cmd, data = worker_conn.recv()

        if cmd == 'close':
            worker_conn.close()
            break
        elif cmd == 'step':
            results = env.step(data)
            worker_conn.send(results)

class Master(object):
    def __init__(self):
        self.master_conns, self.worker_conns = zip(*[Pipe() for _ in range(10)])
        self.list_process = [Process(target=worker, args=[master_conn, worker_conn], daemon=True) 
                             for master_conn, worker_conn in zip(self.master_conns, self.worker_conns)]
        [p.start() for p in self.list_process]
        [worker_conn.close() for worker_conn in self.worker_conns]

    def go(self, actions):
        [master_conn.send(['step', action]) for master_conn, action in zip(self.master_conns, actions)]
        results = [master_conn.recv() for master_conn in self.master_conns]

        return results

    def close(self):
        [master_conn.send(['close', None]) for master_conn in self.master_conns]
        [p.join() for p in self.list_process]

master = Master()
l = []
T = 1000
for t in range(T):
    actions = np.random.rand(10, 1)
    results = master.go(actions)
    l.append(len(results))

sum(l)

Because of the Pipe connections between master each worker, for every time step, we have to send a command to the worker through the Pipe, and the worker sends back the results.由于master每个worker之间有Pipe连接，对于每一个时间步，我们都要通过Pipe向worker发送一个命令，worker发回结果。 We need to do this for a long horizon.我们需要长期这样做。 This will be sometimes a bit slow due to frequent communications.由于频繁的通信，这有时会有点慢。

Therefore, I am wondering if by using latest Python feature asyncio combined with Process to replace Pipe, could it be potentially speedup due to IO concurrency, if I understand its functionality correctly.因此，我想知道是否通过使用最新的 Python 功能 asyncio 结合 Process 来替换 Pipe，如果我正确理解了它的功能，它是否可能由于 IO 并发而加速。

Answer 1

Multiprocessing module has already a solution for parallel task processing: multiprocessing.Pool Multiprocessing 模块已经有并行任务处理的解决方案： multiprocessing.Pool

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(processes=4) as pool:         # start 4 worker processes
        print(pool.map(f, range(10)))       # prints "[0, 1, 4,..., 81]"

You can achieve the same using multiprocessing.Queue .您可以使用multiprocessing.Queue实现相同的效果。 I believe that's how pool.map() is implemented internally.我相信这就是pool.map()在内部实现的方式。

So, what's the difference between multiprocessing.Queue and multiprocessing.Pipe ?那么， multiprocessing.Queue和multiprocessing.Pipe之间有什么区别？ Queue is just a Pipe plus some locking mechanism. Queue只是一个Pipe加上一些锁定机制。 Therefore multiple worker processes can share just a single Queue (or rather 2 - one for commands, one for results), but with Pipe each process would need it's own Pipe (or a pair of, or a duplex one), exactly how you are doing it now.因此，多个工作进程可以只共享一个Queue （或者更确切地说 2 - 一个用于命令，一个用于结果），但是对于Pipe每个进程都需要它自己的Pipe （或一对或双工），这正是您的情况现在做。

The only disadvantage of Queue is performance - because all processes share one queue mutex it doesn't scale well for many processes. Queue的唯一缺点是性能 - 因为所有进程共享一个队列互斥锁，所以它不能很好地扩展到许多进程。 To be sure it can handle tens of thousands items/s I would choose Pipe , but for classic parallel task processing use case I think Queue or just Pool.map() could be OK because they are much easier to use.为了确保它可以处理数以万计的项目/秒，我会选择Pipe ，但对于经典的并行任务处理用例，我认为Queue或Pool.map()可以，因为它们更容易使用。 (Managing processes can be tricky and asyncio doesn't make it easier either.) （管理流程可能很棘手，而且 asyncio 也不会让它变得更容易。）

Hope that helps, I'm aware that I've answered a bit different question than you've asked :)希望有所帮助，我知道我回答的问题与您提出的问题略有不同:)

Python (3.7+) 多处理：用 asyncio 替换 master 和 worker 之间的管道连接以实现 IO 并发

问题描述

1 个解决方案

解决方案1
1 2018-09-23 14:34:28

Python (3.7+) 多处理：用 asyncio 替换 master 和 worker 之间的管道连接以实现 IO 并发

问题描述

1 个解决方案

解决方案1 1 2018-09-23 14:34:28

解决方案1
1 2018-09-23 14:34:28