简体   繁体   English

来自 numpy 数组的 RawArray?

[英]RawArray from numpy array?

I want to share a numpy array across multiple processes.我想在多个进程之间共享一个 numpy 数组。 The processes only read the data, so I want to avoid making copies.进程只读取数据,所以我想避免复制。 I know how to do it if I can start with a multiprocessing.sharedctypes.RawArray and then create a numpy array using numpy.frombuffer .如果我可以从multiprocessing.sharedctypes.RawArray开始,然后使用numpy.frombuffer创建一个 numpy 数组,我知道该怎么做。 But what if I am initially given a numpy array?但是如果我最初得到一个 numpy 数组怎么办? Is there a way to initialize a RawArray with the numpy array's data without copying the data?有没有办法在不复制数据的情况下用 numpy 数组的数据初始化 RawArray? Or is there another way to share the data across the processes without copying it?或者是否有另一种方法可以在不复制的情况下跨进程共享数据?

To my knowledge it is not possible to declare memory as shared after it was assigned to a specific process.据我所知,不可能在分配给特定进程后将内存声明为共享内存。 Similar discussions can be found here and here (more suitable) .类似的讨论可以在这里这里找到(更合适)

Let me quickly sketch the workaround you mentioned (starting with a RawArray and get a numpy.ndarray refference to it).让我快速勾画您提到的解决方法(从RawArray开始并获得numpy.ndarray numpy.ndarray 引用)。

import numpy as np
from multiprocessing.sharedctypes import RawArray
# option 1
raw_arr = RawArray(ctypes.c_int, 12)
# option 2 (set is up, similar to some existing np.ndarray np_arr2)
raw_arr = RawArray(
        np.ctypeslib.as_ctypes_type(np_arr2.dtype), len(np_arr2)
        )
np_arr = np.frombuffer(raw_arr, dtype=np.dtype(raw_arr))
# np_arr: numpy array with shared memory, can be processed by multiprocessing

If you have to start with a numpy.ndarray , you have no other choice as to copy the data如果你必须从numpy.ndarray开始,你没有其他选择来复制数据

import numpy as np
from multiprocessing.sharedctypes import RawArray

np_arr = np.zeros(shape=(3, 4), dtype=np.ubyte)
# option 1
tmp = np.ctypeslib.as_ctypes(np_arr)
raw_arr = RawArray(tmp._type_, tmp)
# option 2
raw_arr = RawArray(np.ctypeslib.as_ctypes_type(np_arr.dtype), np_arr.flatten())

print(raw_arr[:])

I also have some of your requirements: a) given a large numpy array, b) need to share it among a bunch of processes c) read-only etc. And, for this I have been using something along the lines of:我也有您的一些要求:a) 给定一个大的 numpy 数组,b) 需要在一堆进程之间共享它 c) 只读等。为此,我一直在使用以下内容:

mynparray = #initialize a large array from a file
shrarr_base_ptr = RawArray(ctypes.c_double, len*rows*cols)
shrarr_ptr = np.frombuffer(shrarr_base_ptr)
shrarr_ptr = mynparray

where in my case, mynparray is 3-D.在我的情况下,mynparray 是 3-D。 As for the actual sharing, I used the following style and it works so far.至于实际的分享,我使用了以下样式,并且到目前为止有效。

    inq1 = Queue()
    inq2 = Queue()  
    outq = Queue()
    p1 = Process(target = myfunc1, args=(inq1, outq,))
    p1.start()
    inq1.put((shrarr_ptr, ))
    p2 = Process(target = myfunc2, args=(inq2, outq,))
    p2.start()
    inq2.put((shrarr_ptr,))
    inq1.close()
    inq2.close()
    inq1.join_thread()
    inq2.join_thread()
    ....

Note that if you plan to work with numpy arrays, you can omit RawArray entirely, and use:请注意,如果您打算使用 numpy 数组,则可以完全省略RawArray ,并使用:

from multiprocessing.heap import BufferWrapper

def shared_array(shape, dtype):
    dt = np.dtype((dtype, shape))
    wrapper = BufferWrapper(dt.itemsize)
    mem = wrapper.create_memoryview()

    # workaround for bpo-41673 to keep `wrapper` alive
    ct = (ctypes.c_ubyte * dt.itemsize).from_buffer(mem)
    ct._owner = wrapper
    mem = memoryview(ct)

    return np.asarray(mem).view(dt)

The advantage of this approach is it works in cases where np.ctypeslib.as_ctypes_type would fail.这种方法的优点是它适用于np.ctypeslib.as_ctypes_type失败的情况。

I'm not sure if this copies the data internally, but you could pass the flat array:我不确定这是否会在内部复制数据,但您可以传递平面数组:

a = numpy.random.randint(1,10,(4,4))
>>> a
array([[5, 6, 7, 7],
       [7, 9, 2, 8],
       [3, 4, 6, 4],
       [3, 1, 2, 2]])

b = RawArray(ctypes.c_long, a.flat)
>>> b[:]
[5, 6, 7, 7, 7, 9, 2, 8, 3, 4, 6, 4, 3, 1, 2, 2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM