如何在 python 子进程之间传递大型 numpy 数组而不保存到磁盘？

Question

Is there a good way to pass a large chunk of data between two python subprocesses without using the disk?有没有一种好方法可以在不使用磁盘的情况下在两个 python 子进程之间传递大量数据？ Here's a cartoon example of what I'm hoping to accomplish:这是我希望完成的卡通示例：

import sys, subprocess, numpy

cmdString = """
import sys, numpy

done = False
while not done:
    cmd = raw_input()
    if cmd == 'done':
        done = True
    elif cmd == 'data':
        ##Fake data. In real life, get data from hardware.
        data = numpy.zeros(1000000, dtype=numpy.uint8)
        data.dump('data.pkl')
        sys.stdout.write('data.pkl' + '\\n')
        sys.stdout.flush()"""

proc = subprocess.Popen( #python vs. pythonw on Windows?
    [sys.executable, '-c %s'%cmdString],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE)

for i in range(3):
    proc.stdin.write('data\n')
    print proc.stdout.readline().rstrip()
    a = numpy.load('data.pkl')
    print a.shape

proc.stdin.write('done\n')

This creates a subprocess which generates a numpy array and saves the array to disk.这将创建一个生成 numpy 数组并将该数组保存到磁盘的子进程。 The parent process then loads the array from disk.然后父进程从磁盘加载阵列。 It works!有用！

The problem is, our hardware can generate data 10x faster than the disk can read/write.问题是，我们的硬件生成数据的速度比磁盘读取/写入的速度快 10 倍。 Is there a way to transfer data from one python process to another purely in-memory, maybe even without making a copy of the data?有没有一种方法可以将数据从一个 python 进程传输到另一个纯内存中，甚至可能不需要复制数据？ Can I do something like passing-by-reference?我可以做类似传递引用的事情吗？

My first attempt at transferring data purely in-memory is pretty lousy:我第一次尝试纯粹在内存中传输数据非常糟糕：

import sys, subprocess, numpy

cmdString = """
import sys, numpy

done = False
while not done:
    cmd = raw_input()
    if cmd == 'done':
        done = True
    elif cmd == 'data':
        ##Fake data. In real life, get data from hardware.
        data = numpy.zeros(1000000, dtype=numpy.uint8)
        ##Note that this is NFG if there's a '10' in the array:
        sys.stdout.write(data.tostring() + '\\n')
        sys.stdout.flush()"""

proc = subprocess.Popen( #python vs. pythonw on Windows?
    [sys.executable, '-c %s'%cmdString],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE)

for i in range(3):
    proc.stdin.write('data\n')
    a = numpy.fromstring(proc.stdout.readline().rstrip(), dtype=numpy.uint8)
    print a.shape

proc.stdin.write('done\n')

This is extremely slow (much slower than saving to disk) and very, very fragile.这非常慢（比保存到磁盘慢得多）并且非常非常脆弱。 There's got to be a better way!必须有更好的方法！

I'm not married to the 'subprocess' module, as long as the data-taking process doesn't block the parent application.只要数据获取过程不阻塞父应用程序，我就不会与“子进程”模块结婚。 I briefly tried 'multiprocessing', but without success so far.我简单地尝试了“多处理”，但到目前为止没有成功。

Background: We have a piece of hardware that generates up to ~2 GB/s of data in a series of ctypes buffers.背景：我们有一个硬件，可以在一系列 ctypes 缓冲区中生成高达 ~2 GB/s 的数据。 The python code to handle these buffers has its hands full just dealing with the flood of information.处理这些缓冲区的 Python 代码只需要处理大量的信息。 I want to coordinate this flow of information with several other pieces of hardware running simultaneously in a 'master' program, without the subprocesses blocking each other.我想与在“主”程序中同时运行的其他几个硬件协调此信息流，而子进程不会相互阻塞。 My current approach is to boil the data down a little bit in the subprocess before saving to disk, but it'd be nice to pass the full monty to the 'master' process.我目前的方法是在保存到磁盘之前在子进程中将数据稍微简化一下，但最好将完整的 monty 传递给“主”进程。

Answer 1

While googling around for more information about the code Joe Kington posted, I found the numpy-sharedmem package.在谷歌搜索有关 Joe Kington 发布的代码的更多信息时，我找到了numpy-sharedmem包。 Judging from thisnumpy/multiprocessing tutorial it seems to share the same intellectual heritage (maybe largely the same authors? -- I'm not sure).从这个numpy/multiprocessing 教程来看，它似乎共享相同的知识遗产（可能主要是相同的作者？ - 我不确定）。

Using the sharedmem module, you can create a shared-memory numpy array (awesome!), and use it with multiprocessing like this:使用 sharedmem 模块，您可以创建一个共享内存 numpy 数组（太棒了！），并将其与多处理一起使用，如下所示：

import sharedmem as shm
import numpy as np
import multiprocessing as mp

def worker(q,arr):
    done = False
    while not done:
        cmd = q.get()
        if cmd == 'done':
            done = True
        elif cmd == 'data':
            ##Fake data. In real life, get data from hardware.
            rnd=np.random.randint(100)
            print('rnd={0}'.format(rnd))
            arr[:]=rnd
        q.task_done()

if __name__=='__main__':
    N=10
    arr=shm.zeros(N,dtype=np.uint8)
    q=mp.JoinableQueue()    
    proc = mp.Process(target=worker, args=[q,arr])
    proc.daemon=True
    proc.start()

    for i in range(3):
        q.put('data')
        # Wait for the computation to finish
        q.join()   
        print arr.shape
        print(arr)
    q.put('done')
    proc.join()

Running yields运行收益率

rnd=53
(10,)
[53 53 53 53 53 53 53 53 53 53]
rnd=15
(10,)
[15 15 15 15 15 15 15 15 15 15]
rnd=87
(10,)
[87 87 87 87 87 87 87 87 87 87]

Answer 2

Basically, you just want to share a block of memory between processes and view it as a numpy array, right?基本上，您只想在进程之间共享一块内存并将其视为一个 numpy 数组，对吗？

In that case, have a look at this (Posted to numpy-discussion by Nadav Horesh awhile back, not my work).在那种情况下，看看这个（不久前由 Nadav Horesh 发布到 numpy-discussion，不是我的作品）。 There are a couple of similar implementations (some more flexible), but they all essentially use this principle.有几个类似的实现（一些更灵活），但它们基本上都使用这个原则。

#    "Using Python, multiprocessing and NumPy/SciPy for parallel numerical computing"
# Modified and corrected by Nadav Horesh, Mar 2010
# No rights reserved


import numpy as N
import ctypes
import multiprocessing as MP

_ctypes_to_numpy = {
    ctypes.c_char   : N.dtype(N.uint8),
    ctypes.c_wchar  : N.dtype(N.int16),
    ctypes.c_byte   : N.dtype(N.int8),
    ctypes.c_ubyte  : N.dtype(N.uint8),
    ctypes.c_short  : N.dtype(N.int16),
    ctypes.c_ushort : N.dtype(N.uint16),
    ctypes.c_int    : N.dtype(N.int32),
    ctypes.c_uint   : N.dtype(N.uint32),
    ctypes.c_long   : N.dtype(N.int64),
    ctypes.c_ulong  : N.dtype(N.uint64),
    ctypes.c_float  : N.dtype(N.float32),
    ctypes.c_double : N.dtype(N.float64)}

_numpy_to_ctypes = dict(zip(_ctypes_to_numpy.values(), _ctypes_to_numpy.keys()))


def shmem_as_ndarray(raw_array, shape=None ):

    address = raw_array._obj._wrapper.get_address()
    size = len(raw_array)
    if (shape is None) or (N.asarray(shape).prod() != size):
        shape = (size,)
    elif type(shape) is int:
        shape = (shape,)
    else:
        shape = tuple(shape)

    dtype = _ctypes_to_numpy[raw_array._obj._type_]
    class Dummy(object): pass
    d = Dummy()
    d.__array_interface__ = {
        'data' : (address, False),
        'typestr' : dtype.str,
        'descr' :   dtype.descr,
        'shape' : shape,
        'strides' : None,
        'version' : 3}
    return N.asarray(d)

def empty_shared_array(shape, dtype, lock=True):
    '''
    Generate an empty MP shared array given ndarray parameters
    '''

    if type(shape) is not int:
        shape = N.asarray(shape).prod()
    try:
        c_type = _numpy_to_ctypes[dtype]
    except KeyError:
        c_type = _numpy_to_ctypes[N.dtype(dtype)]
    return MP.Array(c_type, shape, lock=lock)

def emptylike_shared_array(ndarray, lock=True):
    'Generate a empty shared array with size and dtype of a  given array'
    return empty_shared_array(ndarray.size, ndarray.dtype, lock)

Answer 3

From the other answers, it seems that numpy-sharedmem is the way to go.从其他答案来看， numpy-sharedmem似乎是要走的路。

However, if you need a pure python solution, or installing extensions, cython or the like is a (big) hassle, you might want to use the following code which is a simplified version of Nadav's code:但是，如果您需要纯 python 解决方案，或者安装扩展、cython 等是（大）麻烦，您可能需要使用以下代码，它是 Nadav 代码的简化版本：

import numpy, ctypes, multiprocessing

_ctypes_to_numpy = {
    ctypes.c_char   : numpy.dtype(numpy.uint8),
    ctypes.c_wchar  : numpy.dtype(numpy.int16),
    ctypes.c_byte   : numpy.dtype(numpy.int8),
    ctypes.c_ubyte  : numpy.dtype(numpy.uint8),
    ctypes.c_short  : numpy.dtype(numpy.int16),
    ctypes.c_ushort : numpy.dtype(numpy.uint16),
    ctypes.c_int    : numpy.dtype(numpy.int32),
    ctypes.c_uint   : numpy.dtype(numpy.uint32),
    ctypes.c_long   : numpy.dtype(numpy.int64),
    ctypes.c_ulong  : numpy.dtype(numpy.uint64),
    ctypes.c_float  : numpy.dtype(numpy.float32),
    ctypes.c_double : numpy.dtype(numpy.float64)}

_numpy_to_ctypes = dict(zip(_ctypes_to_numpy.values(),
                            _ctypes_to_numpy.keys()))


def shm_as_ndarray(mp_array, shape = None):
    '''Given a multiprocessing.Array, returns an ndarray pointing to
    the same data.'''

    # support SynchronizedArray:
    if not hasattr(mp_array, '_type_'):
        mp_array = mp_array.get_obj()

    dtype = _ctypes_to_numpy[mp_array._type_]
    result = numpy.frombuffer(mp_array, dtype)

    if shape is not None:
        result = result.reshape(shape)

    return numpy.asarray(result)


def ndarray_to_shm(array, lock = False):
    '''Generate an 1D multiprocessing.Array containing the data from
    the passed ndarray.  The data will be *copied* into shared
    memory.'''

    array1d = array.ravel(order = 'A')

    try:
        c_type = _numpy_to_ctypes[array1d.dtype]
    except KeyError:
        c_type = _numpy_to_ctypes[numpy.dtype(array1d.dtype)]

    result = multiprocessing.Array(c_type, array1d.size, lock = lock)
    shm_as_ndarray(result)[:] = array1d
    return result

You would use it like this:你会像这样使用它：

Use sa = ndarray_to_shm(a) to convert the ndarray a into a shared multiprocessing.Array .使用sa = ndarray_to_shm(a)将 ndarray a转换为共享的multiprocessing.Array 。
Use multiprocessing.Process(target = somefunc, args = (sa, ) (and start , maybe join ) to call somefunc in a separate process , passing the shared array.使用multiprocessing.Process(target = somefunc, args = (sa, ) （和start ，也许join ）在单独的进程中调用somefunc ，传递共享数组。
In somefunc , use a = shm_as_ndarray(sa) to get an ndarray pointing to the shared data.在somefunc ，使用a = shm_as_ndarray(sa)获取指向共享数据的 ndarray。 (Actually, you may want to do the same in the original process, immediately after creating sa , in order to have two ndarrays referencing the same data.) （实际上，您可能希望在创建sa后立即在原始过程中执行相同的操作，以便让两个 ndarray 引用相同的数据。）

AFAICS, you don't need to set lock to True, since shm_as_ndarray will not use the locking anyhow. AFAICS，您不需要将锁定设置为 True，因为shm_as_ndarray无论如何都不会使用锁定。 If you need locking, you would set lock to True and call acquire/release on sa .如果您需要锁定，您可以将锁定设置为 True 并在sa上调用获取/释放。

Also, if your array is not 1-dimensional, you might want to transfer the shape along with sa (eg use args = (sa, a.shape) ).此外，如果您的数组不是一维的，您可能希望将形状与 sa 一起传输（例如使用args = (sa, a.shape) ）。

This solution has the advantage that it does not need additional packages or extension modules, except multiprocessing (which is in the standard library).该解决方案的优点是不需要额外的包或扩展模块，除了多处理（在标准库中）。

Answer 4

Use threads.使用线程。 But I guess you are going to get problems with the GIL.但我想你会遇到 GIL 的问题。

Instead: Choose your poison .相反：选择你的毒药。

I know from the MPI implementations I work with, that they use shared memory for on-node-communications.我从我使用的 MPI 实现中知道，它们使用共享内存进行节点通信。 You will have to code your own synchronization in that case.在这种情况下，您必须编写自己的同步代码。

2 GB/s sounds like you will get problems with most "easy" methods, depending on your real-time constraints and available main memory. 2 GB/s 听起来像大多数“简单”方法都会遇到问题，这取决于您的实时限制和可用的主内存。

Answer 5

One possibility to consider is to use a RAM drive for the temporary storage of files to be shared between processes .要考虑的一种可能性是使用RAM 驱动器临时存储要在进程之间共享的文件。 A RAM drive is where a portion of RAM is treated as a logical hard drive, to which files can be written/read as you would with a regular drive, but at RAM read/write speeds. RAM 驱动器是将 RAM 的一部分视为逻辑硬盘驱动器的地方，可以像使用常规驱动器一样写入/读取文件，但以 RAM 读取/写入速度。

This article describes using the ImDisk software (for MS Win) to create such disk and obtains file read/write speeds of 6-10 Gigabytes/second: https://www.tekrevue.com/tip/create-10-gbs-ram-disk-windows/本文介绍使用 ImDisk 软件（适用于 MS Win）创建此类磁盘并获得 6-10 GB/秒的文件读写速度： https : //www.tekrevue.com/tip/create-10-gbs-ram -磁盘窗口/

An example in Ubuntu: https://askubuntu.com/questions/152868/how-do-i-make-a-ram-disk#152871 Ubuntu 中的一个例子： https : //askubuntu.com/questions/152868/how-do-i-make-a-ram-disk#152871

Another noted benefit is that files with arbitrary formats can be passed around with such method: eg Picke, JSON, XML, CSV, HDF5, etc...另一个值得注意的好处是可以使用这种方法传递具有任意格式的文件：例如 Picke、JSON、XML、CSV、HDF5 等...

Keep in mind that anything stored on the RAM disk is wiped on reboot.请记住，存储在 RAM 磁盘上的任何内容都会在重新启动时被擦除。

Answer 6

Use threads.使用线程。 You probably won't have problems with the GIL.您可能不会遇到 GIL 问题。

The GIL only affects Python code, not C/Fortran/Cython backed libraries. GIL 仅影响 Python 代码，而不影响 C/Fortran/Cython 支持的库。 Most numpy operations and a good chunk of the C-backed Scientific Python stack release the GIL and can operate just fine on multiple cores.大多数 numpy 操作和大量 C 支持的科学 Python 堆栈都发布了 GIL，并且可以在多个内核上正常运行。 This blogpost discusses the GIL and scientific Python in more depth. 这篇博文更深入地讨论了 GIL 和科学 Python。

Edit编辑

Simple ways to use threads include the threading module and multiprocessing.pool.ThreadPool .使用线程的简单方法包括threading模块和multiprocessing.pool.ThreadPool 。

如何在 python 子进程之间传递大型 numpy 数组而不保存到磁盘？

问题描述

6 个解决方案

解决方案1
29 已采纳 2011-02-18 01:26:14

解决方案2
9 2011-02-17 20:18:11

解决方案3
5 2013-03-13 16:26:02

解决方案4
3 2011-02-17 19:56:48

解决方案5
2 2018-01-02 17:37:43

解决方案6
1 2015-05-03 17:15:24

Edit编辑

如何在 python 子进程之间传递大型 numpy 数组而不保存到磁盘？

问题描述

6 个解决方案

解决方案1 29 已采纳 2011-02-18 01:26:14

解决方案2 9 2011-02-17 20:18:11

解决方案3 5 2013-03-13 16:26:02

解决方案4 3 2011-02-17 19:56:48

解决方案5 2 2018-01-02 17:37:43

解决方案6 1 2015-05-03 17:15:24

Edit编辑

解决方案1
29 已采纳 2011-02-18 01:26:14

解决方案2
9 2011-02-17 20:18:11

解决方案3
5 2013-03-13 16:26:02

解决方案4
3 2011-02-17 19:56:48

解决方案5
2 2018-01-02 17:37:43

解决方案6
1 2015-05-03 17:15:24