简体   繁体   English

使用Pipe在进程之间传输Python对象时的字节限制?

[英]Byte limit when transferring Python objects between Processes using a Pipe?

I have a custom simulator (for biology) running on a 64-bit Linux (kernel version 2.6.28.4) machine using a 64-bit Python 3.3.0 CPython interpreter. 我有一个使用64位Python 3.3.0 CPython解释器在64位Linux(内核版本2.6.28.4)机器上运行的自定义模拟器(用于生物学)。

Because the simulator depends on many independent experiments for valid results, I built in parallel processing for running experiments. 因为模拟器依赖于许多独立实验来获得有效结果,所以我建立了并行处理来运行实验。 Communication between the threads primarily occurs under a producer-consumer pattern with managed multiprocessing Queue s ( doc ). 线程之间的通信主要发生在具有托管multiprocessing Queuedoc )的生产者 - 消费者模式下。 The rundown of the architecture is as follows: 该体系结构的破坏如下:

  • a master processes that handles spawning and managing Process es and the various Queue s 处理产生和管理Process es以及各种Queue的主进程
  • N worker processes that do simulations N个工作进程进行模拟
  • 1 result consumer process that consumes the results of a simulation and sorts and analyzes the results 1结果消费者过程消耗模拟结果并对结果进行分类和分析

The master process and the worker processes communicate via an input Queue . 主进程和工作进程通过输入Queue进行通信。 Similarly, the worker processes place their results in an output Queue which the result consumer process consumes items from. 类似地,工作进程将其结果放在输出Queue中,结果使用者进程使用该Queue中的项目。 The final ResultConsumer object is passed via a multiprocessing Pipe ( doc ) back to the master process. 最终的ResultConsumer对象通过multiprocessing Pipedoc )传递回主进程。

Everything works fine until it tries to pass the ResultConsumer object back to the master process via the Pipe : 一切正常,直到它试图通过Pipe将ResultConsumer对象传递回主进程:

Traceback (most recent call last):
  File "/home/cmccorma/.local/lib/python3.3/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/cmccorma/.local/lib/python3.3/multiprocessing/process.py", line 95, in run
    self._target(*self._args, **self._kwargs)
  File "DomainArchitectureGenerator.py", line 93, in ResultsConsumerHandler
    pipeConn.send(resCon)
  File "/home/cmccorma/.local/lib/python3.3/multiprocessing/connection.py", line 207, in send
    self._send_bytes(buf.getbuffer())
  File "/home/cmccorma/.local/lib/python3.3/multiprocessing/connection.py", line 394, in _send_bytes
    self._send(struct.pack("!i", n))
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

I understand the first two traces (unhandled exits in the Process library), and the third is my line of code for sending the ResultConsumer object down the Pipe to the master process. 我理解前两个跟踪( Process库中未处理的出口),第三个是我将ResultConsumer对象沿Pipe发送到主进程的代码行。 The last two traces are where it gets interesting. 最后两条痕迹是它变得有趣的地方。 A Pipe pickles any object that is sent to it and passes the resulting bytes to the other end (matching connection) where it is unpickled upon running recv() . Pipe会发送任何发送给它的对象,并将生成的字节传递给另一端(匹配连接),在运行recv()时,它会被取消激活。 self._send_bytes(buf.getbuffer()) is attempting to send the bytes of the pickled object. self._send_bytes(buf.getbuffer())正在尝试发送pickle对象的字节。 self._send(struct.pack("!i", n)) is attempting to pack a struct with an integer (network/big-endian) of length n, where n is the length of the buffer passed in as a parameter (the struct library handles conversions between Python values and C structs represented as Python strings, see the doc ). self._send(struct.pack("!i", n))试图打包一个长度为n的整数(network / big-endian)的结构,其中n是作为参数传入的缓冲区的长度( struct库处理Python值和表示为Python字符串的C结构之间的转换,请参阅doc )。

This error only occurs when attempting a lot of experiments, eg 10 experiments will not cause it, but 1000 will consitently (all other parameters being constant). 只有在尝试大量实验时才会出现此错误,例如,10个实验不会导致它,但1000个将是有意义的(所有其他参数都是恒定的)。 My best hypothesis so far as to why struct.error is thrown is that the number of bytes trying to be pushed down the pipe exceeds 2^32-1 (2147483647), or ~2 GB. 到目前为止,为什么抛出struct.error我最好的假设是,尝试按下管道的字节数超过2 ^ 32-1(2147483647),或者大约2 GB。

So my question is two-fold: 所以我的问题是双重的:

  1. I'm getting stuck with my investigations as struct.py essentially just imports from _struct and I have no idea where that is. 我因为struct.py基本上只是从_struct而陷入调查,我不知道它在哪里。

  2. The byte limit seems arbitrary given that the underlying architecture is all 64-bit. 鉴于底层架构都是64位,字节限制似乎是任意的。 So, why can't I pass anything larger than that? 那么,为什么我不能通过比这更大的东西? Additionally, if I can't change this, are there any good (read: easy) workarounds to this issue? 另外,如果我无法改变这个问题,那么这个问题是否有任何好的(阅读:简单)解决方法?

Note: I don't think that using a Queue in place of a Pipe will solve the issue, as I suspect that Queue 's use a similar pickling intermediate step. 注意:我不认为使用Queue代替Pipe会解决问题,因为我怀疑Queue使用类似的酸洗中间步骤。 EDIT: This note is entirely incorrect as pointed out in abarnert's answer. 编辑:正如abarnert的回答所指出的,这个说明是完全错误的。

I'm getting stuck with my investigations as struct.py essentially just imports from _struct and I have no idea where that is. 我因为struct.py基本上只是从_struct导入而陷入调查,我不知道它在哪里。

In CPython, _struct is a C extension module built from _struct.c in the Modules directory in the source tree. 在CPython中, _struct是一个C扩展模块, _struct是从源树中的Modules目录中的_struct.c构建的。 You can find the code online here . 您可以在这里找到在线代码。

Whenever foo.py does an import _foo , that's almost always a C extension module, usually built from _foo.c . 每当foo.py执行import _foo ,它几乎总是一个C扩展模块,通常是从_foo.c And if you can't find a foo.py at all, it's probably a C extension module, built from _foomodule.c . 如果你根本找不到foo.py ,它可能是一个C扩展模块,由_foomodule.c

It's also often worth looking at the equivalent PyPy source , even if you're not using PyPy. 即使你没有使用PyPy,它也经常值得查看等效的PyPy源代码 They reimplement almost all extension modules in pure Python—and for the remainder (including this case), the underlying "extension language" is RPython, not C. 他们重新实现纯Python中的几乎所有扩展模块 - 对于其余的(包括本例),底层的“扩展语言”是RPython,而不是C.

However, in this case, you don't need to know anything about how struct is working beyond what's in the docs. 但是,在这种情况下,您不需要了解struct如何工作超出文档中的内容。


The byte limit seems arbitrary given that the underlying architecture is all 64-bit. 鉴于底层架构都是64位,字节限制似乎是任意的。

Look at the code it's calling: 看看它调用的代码:

self._send(struct.pack("!i", n))

If you look at the documentation , the 'i' format character explicitly means "4-byte C integer", not "whatever ssize_t is". 如果查看文档'i'格式字符明确表示“4字节C整数”,而不是“ ssize_t是什么”。 For that, you'd have to use 'n' . 为此,你必须使用'n' Or you might want to explicitly use a long long, with 'q' . 或者您可能希望明确使用长的'q'

You can monkeypatch multiprocessing to use struct.pack('!q', n) . 您可以使用monkeypatch multiprocessing来使用struct.pack('!q', n) Or '!q' . '!q' Or encode the length in some way other than struct . 或者以struct之外的某种方式编码长度。 This will, of course, break compatibility with non-patched multiprocessing , which could be a problem if you're trying to do distributed processing across multiple computers or something. 当然,这将破坏与非修补multiprocessing兼容性,如果您尝试跨多台计算机或其他东西进行分布式处理,这可能是一个问题。 But it should be pretty simple: 但它应该很简单:

def _send_bytes(self, buf):
    # For wire compatibility with 3.2 and lower
    n = len(buf)
    self._send(struct.pack("!q", n)) # was !i
    # The condition is necessary to avoid "broken pipe" errors
    # when sending a 0-length buffer if the other end closed the pipe.
    if n > 0:
        self._send(buf)

def _recv_bytes(self, maxsize=None):
    buf = self._recv(8) # was 4
    size, = struct.unpack("!q", buf.getvalue()) # was !i
    if maxsize is not None and size > maxsize:
        return None
    return self._recv(size)

Of course there's no guarantee that this change is sufficient; 当然,不能保证这种变化是充分的; you'll want to read through the rest of the surrounding code and test the hell out of it. 你会想要阅读周围代码的其余部分并测试它的地狱。


Note: I suspect that using a Queue in place of a Pipe will not solve the issue, as I suspect that Queue 's use a similar pickling intermediate step. 注意:我怀疑使用Queue代替Pipe不会解决问题,因为我怀疑Queue使用类似的酸洗中间步骤。

Well, the problem has nothing to do with pickling. 嗯,问题与酸洗无关。 Pipe isn't using pickle to send the length, it's using struct . Pipe没有使用pickle发送长度,它使用struct You can verify that pickle wouldn't have this problem: pickle.loads(pickle.dumps(1<<100)) == 1<<100 will return True . 您可以验证pickle不会出现此问题: pickle.loads(pickle.dumps(1<<100)) == 1<<100将返回True

(In earlier versions, pickle also had problems with huge objects—eg, a list of 2G elements—which could have caused problems at a scale about 8x as high as the one you're currently hitting. But that's been fixed by 3.3.) (在早期版本中, pickle 存在大型物体的问题 - 例如,2G元素的list - 这可能导致问题的规模大约是目前正在击中的那么高的8倍。但是已经修正了3.3。)

Meanwhile… wouldn't it be faster to just try it and see, instead of digging through the source to try to figure out whether it would work? 与此同时......尝试看看它并不是更快,而不是通过挖掘源来试图弄清楚它是否会起作用?


Also, are you sure you really want to pass around a 2GB data structure by implicit pickling? 另外,你确定你真的想通过隐式酸洗来传递2GB的数据结构吗?

If I were doing something that slow and memory-hungry, I'd prefer to make that explicit—eg, pickle to a tempfile and send the path or fd. 如果我做了一些缓慢而且需要内存的东西,我宁愿将其显式化 - 例如,pickle到tempfile并发送路径或fd。 (If you're using numpy or pandas or something, use its binary file format instead of pickle , but same idea.) (如果您正在使用numpypandas或其他东西,请使用其二进制文件格式而不是pickle ,但同样的想法。)

Or, even better, share the data. 或者,更好的是,共享数据。 Yes, mutable shared state is bad… but sharing immutable objects is fine. 是的,可变共享状态很糟糕......但共享不可变对象很好。 Whatever you've got 2GB of, can you put it in a multiprocessing.Array , or put it in a ctypes array or struct (of arrays or structs of …) that you can share via multiprocessing.sharedctypes , or ctypes it out of a file that you mmap on both sides, or…? 无论你有2GB,你可以把它放在一个multiprocessing.Array ,或者把它放在一个ctypes数组或结构(数组或结构......)中你可以通过multiprocessing.sharedctypes共享,或者ctypes它是一个你双面mmap file ,还是......? There's a bit of extra code to define and pick apart the structures, but when the benefits are likely to be this big, it's worth trying. 有一些额外的代码来定义和分离结构,但是当这些好处可能很大时,值得尝试。


Finally, when you think you've found a bug/obvious missing feature/unreasonable limitation in Python, it's worth looking at the bug tracker. 最后,当您认为在Python中发现了一个错误/明显缺失的功能/不合理的限制时,值得查看错误跟踪器。 It looks like issue 17560: problem using multiprocessing with really big objects? 看起来像问题17560:使用多处理真正大对象的问题? is exactly your problem, and has lots of information, including suggested workarounds. 这正是你的问题,并有很多信息,包括建议的解决方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM