简体   繁体   English

Python (3.7+) 多处理:用 asyncio 替换 master 和 worker 之间的管道连接以实现 IO 并发

[英]Python (3.7+) multiprocessing: replace Pipe connection between master and workers with asyncio for IO concurrency

Suppose we have a following toy version of master-worker pipeline to parallel data collection假设我们有以下玩具版本的 master-worker 管道来并行数据收集

# pip install gym
import gym
import numpy as np
from multiprocessing import Process, Pipe

def worker(master_conn, worker_conn):
    master_conn.close()

    env = gym.make('Pendulum-v0')
    env.reset()

    while True:
        cmd, data = worker_conn.recv()

        if cmd == 'close':
            worker_conn.close()
            break
        elif cmd == 'step':
            results = env.step(data)
            worker_conn.send(results)

class Master(object):
    def __init__(self):
        self.master_conns, self.worker_conns = zip(*[Pipe() for _ in range(10)])
        self.list_process = [Process(target=worker, args=[master_conn, worker_conn], daemon=True) 
                             for master_conn, worker_conn in zip(self.master_conns, self.worker_conns)]
        [p.start() for p in self.list_process]
        [worker_conn.close() for worker_conn in self.worker_conns]

    def go(self, actions):
        [master_conn.send(['step', action]) for master_conn, action in zip(self.master_conns, actions)]
        results = [master_conn.recv() for master_conn in self.master_conns]

        return results

    def close(self):
        [master_conn.send(['close', None]) for master_conn in self.master_conns]
        [p.join() for p in self.list_process]

master = Master()
l = []
T = 1000
for t in range(T):
    actions = np.random.rand(10, 1)
    results = master.go(actions)
    l.append(len(results))

sum(l)

Because of the Pipe connections between master each worker, for every time step, we have to send a command to the worker through the Pipe, and the worker sends back the results.由于master每个worker之间有Pipe连接,对于每一个时间步,我们都要通过Pipe向worker发送一个命令,worker发回结果。 We need to do this for a long horizon.我们需要长期这样做。 This will be sometimes a bit slow due to frequent communications.由于频繁的通信,这有时会有点慢。

Therefore, I am wondering if by using latest Python feature asyncio combined with Process to replace Pipe, could it be potentially speedup due to IO concurrency, if I understand its functionality correctly.因此,我想知道是否通过使用最新的 Python 功能 asyncio 结合 Process 来替换 Pipe,如果我正确理解了它的功能,它是否可能由于 IO 并发而加速。

Multiprocessing module has already a solution for parallel task processing: multiprocessing.Pool Multiprocessing 模块已经有并行任务处理的解决方案: multiprocessing.Pool

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(processes=4) as pool:         # start 4 worker processes
        print(pool.map(f, range(10)))       # prints "[0, 1, 4,..., 81]"

You can achieve the same using multiprocessing.Queue .您可以使用multiprocessing.Queue实现相同的效果。 I believe that's how pool.map() is implemented internally.我相信这就是pool.map()在内部实现的方式。

So, what's the difference between multiprocessing.Queue and multiprocessing.Pipe ?那么, multiprocessing.Queuemultiprocessing.Pipe之间有什么区别? Queue is just a Pipe plus some locking mechanism. Queue只是一个Pipe加上一些锁定机制。 Therefore multiple worker processes can share just a single Queue (or rather 2 - one for commands, one for results), but with Pipe each process would need it's own Pipe (or a pair of, or a duplex one), exactly how you are doing it now.因此,多个工作进程可以只共享一个Queue (或者更确切地说 2 - 一个用于命令,一个用于结果),但是对于Pipe每个进程都需要它自己的Pipe (或一对或双工),这正是您的情况现在做。

The only disadvantage of Queue is performance - because all processes share one queue mutex it doesn't scale well for many processes. Queue的唯一缺点是性能 - 因为所有进程共享一个队列互斥锁,所以它不能很好地扩展到许多进程。 To be sure it can handle tens of thousands items/s I would choose Pipe , but for classic parallel task processing use case I think Queue or just Pool.map() could be OK because they are much easier to use.为了确保它可以处理数以万计的项目/秒,我会选择Pipe ,但对于经典的并行任务处理用例,我认为QueuePool.map()可以,因为它们更容易使用。 (Managing processes can be tricky and asyncio doesn't make it easier either.) (管理流程可能很棘手,而且 asyncio 也不会让它变得更容易。)

Hope that helps, I'm aware that I've answered a bit different question than you've asked :)希望有所帮助,我知道我回答的问题与您提出的问题略有不同:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python 中 multiprocessing、asyncio 和 concurrency.futures 之间的区别 - Difference between multiprocessing, asyncio and concurrency.futures in python 如何在Windows和Python 3.7+中的asyncio StreamReader中检查是否要读取某些内容? - How can I check if something to read in asyncio StreamReader in Windows and Python 3.7+? Python 3.7+:按顺序访问字典元素 - Python 3.7+: Access elements of a dictionary by order python的asyncio模块的最大连接并发性 - Max connection concurrency with python's asyncio module 在python 3.7+中创建带有时区的数据时间? - Creating a datatime with timezone in python 3.7+? 对python 3.7+字典进行排序的最快方法 - Fastest way to sort a python 3.7+ dictionary 在 Python 3.7+ 中更改 dict 中的键顺序 - Change keys order in dict in Python 3.7+ {'some_string'} 和 set('some_string) 在 Python 3.7+ 中作为关键字参数的区别 - Difference between {'some_string'} and set('some_string) as key word arguments in Python 3.7+ multiprocessing.Pipe和multiprocessing.connection.Pipe之间的区别 - Difference between multiprocessing.Pipe and multiprocessing.connection.Pipe 如何在Python 3.7+中定义循环相关的数据类? - How to define circularly dependent data classes in Python 3.7+?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM