简体繁体 English

在python多处理中从子进程返回大对象

[英]Returning large objects from child processes in python multiprocessing

原文 2014-10-22 18:07:36 1 1 python/ multiprocessing/ pypy/ python-multiprocessing

I'm working with Python multiprocessing to spawn some workers. 我正在使用Python多处理来产生一些工作者。 Each of them should return an array that's a few MB in size. 它们中的每一个都应该返回一个大小为几MB的数组。

Is it correct that since my return array is created in the child process, it needs to be copied back to the parent's memory when the process ends? 是否正确，因为我的返回数组是在子进程中创建的，所以当进程结束时需要将它复制回父进程的内存中？ (this seems to take a while, but it might be a pypy issue) （这似乎需要一段时间，但它可能是一个小问题）
Is there a mechanism to allow the parent and child to access the same in-memory object? 是否有一种机制允许父和子访问相同的内存中对象？ (synchronization is not an issue since only one child would access each object) （同步不是问题，因为只有一个孩子会访问每个对象）

I'm afraid I have a few gaps in how python implements multi-processing, and trying to persuade pypy to play nice is not making things any easier. 我担心我在python如何实现多处理方面存在一些差距，并且试图说服pypy玩得很好并不会让事情变得更容易。 Thanks! 谢谢！

1 个解决方案

Yes, if the return array is created in the child process, it must be sent to the parent by pickling it, sending the pickled bytes back to the parent via a Pipe , and then unpickling the object in the parent. 是的，如果在子进程中创建了返回数组，则必须通过pickle将其发送给父进程，通过Pipe将pickle字节发送回父进程，然后在父进程中取消对象。 For a large object, this is pretty slow in CPython, so it's not just a PyPy issue. 对于一个大型对象，这在CPython中相当慢，所以它不仅仅是一个PyPy问题。 It is possible that performance is worse in PyPy, though; 但PyPy中的性能可能更差 ; I haven't tried comparing the two, but this PyPy bug seems to suggest that multiprocessing in PyPy is slower than in CPython. 我没有尝试比较这两个，但是这个PyPy错误似乎表明PyPy中的multiprocessing比CPython慢。

In CPython, there is a way to allocate ctypes objects in shared memory, via multiprocessing.sharedctypes . 在CPython中，有一种方法可以通过multiprocessing.sharedctypes在共享内存中分配ctypes对象。 PyPy seems to support this API, too. PyPy似乎也支持这个API。 The limitation (obviously) is that you're restricted to ctypes objects. 限制（显然）是你被限制为ctypes对象。

There is also multiprocessing.Manager , which would allow you to create a shared array/list object in a Manager process, and then both the parent and child could access the shared list via a Proxy object. 还有multiprocessing.Manager ，它允许您在Manager进程中创建共享数组/列表对象，然后父和子都可以通过Proxy对象访问共享列表。 The downside there is that read/write performance to the object is much slower than it would be as a local object, or even if it was a roughly equivalent object created using multiprocessing.sharedctypes . 缺点是对象的读/写性能比作为本地对象要慢得多，或者即使它是使用multiprocessing.sharedctypes创建的大致等效的对象。