简体繁体 English

为什么python多处理pickle对象在进程之间传递对象？

[英]Why does python multiprocessing pickle objects to pass objects between processes?

原文 2014-07-03 21:21:06 2 1 python/ multiprocessing/ pickle

Why does the multiprocessing package for python pickle objects to pass them between processes, ie to return results from different processes to the main interpreter process? 为什么python pickle对象的multiprocessing包在进程之间传递它们，即将不同进程的结果返回到主解释器进程？ This may be an incredibly naive question, but why can't process A say to process B "object x is at point y in memory, it's yours now" without having to perform the operation necessary to represent the object as a string. 这可能是一个非常天真的问题，但是为什么不能处理A说出处理B“对象x在内存中的点y，它现在是你的”而不必执行将对象表示为字符串所必需的操作。

1 个解决方案

multiprocessing runs jobs in different processes. multiprocessing在不同的进程中运行作业。 Processes have their own independent memory spaces, and in general cannot share data through memory. 进程有自己独立的内存空间，一般不能通过内存共享数据。

To make processes communicate, you need some sort of channel. 要使进程通信，您需要某种通道。 One possible channel would be a "shared memory segment", which pretty much is what it sounds like. 一个可能的渠道是“共享内存段”，这听起来几乎就是这样。 But it's more common to use "serialization". 但是使用“序列化”更常见。 I haven't studied this issue extensively but my guess is that the shared memory solution is too tightly coupled; 我没有广泛研究过这个问题，但我的猜测是共享内存解决方案太紧密了; serialization lets processes communicate without letting one process cause a fault in the other. 序列化允许进程进行通信，而不会让一个进程导致另一个进程出错。

When data sets are really large, and speed is critical, shared memory segments may be the best way to go. 当数据集非常大且速度很关键时，共享内存段可能是最好的方法。 The main example I can think of is video frame buffer image data (for example, passed from a user-mode driver to the kernel or vice versa). 我能想到的主要例子是视频帧缓冲图像数据（例如，从用户模式驱动程序传递到内核，反之亦然）。

http://en.wikipedia.org/wiki/Shared_memory http://en.wikipedia.org/wiki/Shared_memory

http://en.wikipedia.org/wiki/Serialization http://en.wikipedia.org/wiki/Serialization

Linux, and other *NIX operating systems, provide a built-in mechanism for sharing data via serialization: "domain sockets" This should be quite fast. Linux和其他* NIX操作系统提供了一种通过序列化共享数据的内置机制：“域套接字”这应该非常快。

http://en.wikipedia.org/wiki/Unix_domain_socket http://en.wikipedia.org/wiki/Unix_domain_socket

Since Python has pickle that works well for serialization, multiprocessing uses that. 由于Python具有适用于序列化的pickle ，因此multiprocessing使用它。 pickle is a fast, binary format; pickle是一种快速的二进制格式; it should be more efficient in general than a serialization format like XML or JSON. 它通常比XML或JSON等序列化格式更有效。 There are other binary serialization formats such as Google Protocol Buffers. 还有其他二进制序列化格式，例如Google Protocol Buffers。

One good thing about using serialization: it's about the same to share the work within one computer (to use additional cores) or to share the work between multiple computers (to use multiple computers in a cluster). 使用序列化的一个好处是：在一台计算机内共享工作（使用其他核心）或在多台计算机之间共享工作（在群集中使用多台计算机）大致相同。 The serialization work is identical, and network sockets work about like domain sockets. 序列化工作是相同的，网络套接字的工作方式与域套接字类似。

EDIT: @Mike McKerns said, in a comment below, that multiprocessing can use shared memory sometimes. 编辑：@Mike McKerns在下面的评论中说， multiprocessing有时可以使用共享内存。 I did a Google search and found this great discussion of it: Python multiprocessing shared memory 我做了谷歌搜索，发现了这个很好的讨论： Python多处理共享内存