使我的NumPy數組在進程間共享

Question

我已經閱讀了很多有關共享數組的問題，對於簡單的數組來說它似乎很簡單，但我仍然試圖讓它適用於我擁有的數組。

import numpy as np
data=np.zeros(250,dtype='float32, (250000,2)float32')

我試圖通過嘗試以某種方式使mp.Array接受data ，嘗試將其轉換為共享數組，我也嘗試使用ctypes創建數組：

import multiprocessing as mp
data=mp.Array('c_float, (250000)c_float',250)

我設法使代碼工作的唯一方法是不將數據傳遞給函數，而是將編碼的字符串傳遞給未壓縮/解碼，但最終會調用n（字符串數）進程，這似乎是多余的。 我希望的實現基於將二進制字符串列表切換為x（進程數）並將此塊， data和index傳遞給除了在本地修改data之外的進程，因此關於如何使其共享的問題 ，使用自定義（嵌套）numpy數組的任何示例都將是一個很好的幫助。

PS：這個問題是Python多處理的后續問題

Answer 1

請注意，您可以從一個復雜的dtype數組開始：

In [4]: data = np.zeros(250,dtype='float32, (250000,2)float32')

並將其視為同源dtype數組：

In [5]: data2 = data.view('float32')

然后，將其轉換回復雜的dtype：

In [7]: data3 = data2.view('float32, (250000,2)float32')

改變dtype是一種非常快速的操作; 它不會影響基礎數據，只會影響NumPy解釋它的方式。 所以改變dtype幾乎是無成本的。

因此，您所讀到的關於具有簡單（同質）dtypes的數組的內容可以通過上述技巧輕松應用於復雜的dtype。

下面的代碼借用了JF Sebastian的答案中的許多想法。

import numpy as np
import multiprocessing as mp
import contextlib
import ctypes
import struct
import base64


def decode(arg):
    chunk, counter = arg
    print len(chunk), counter
    for x in chunk:
        peak_counter = 0
        data_buff = base64.b64decode(x)
        buff_size = len(data_buff) / 4
        unpack_format = ">%dL" % buff_size
        index = 0
        for y in struct.unpack(unpack_format, data_buff):
            buff1 = struct.pack("I", y)
            buff2 = struct.unpack("f", buff1)[0]
            with shared_arr.get_lock():
                data = tonumpyarray(shared_arr).view(
                    [('f0', '<f4'), ('f1', '<f4', (250000, 2))])
                if (index % 2 == 0):
                    data[counter][1][peak_counter][0] = float(buff2)
                else:
                    data[counter][1][peak_counter][1] = float(buff2)
                    peak_counter += 1
            index += 1
        counter += 1


def pool_init(shared_arr_):
    global shared_arr
    shared_arr = shared_arr_  # must be inherited, not passed as an argument


def tonumpyarray(mp_arr):
    return np.frombuffer(mp_arr.get_obj())


def numpy_array(shared_arr, peaks):
    """Fills the NumPy array 'data' with m/z-intensity values acquired
    from b64 decoding and unpacking the binary string read from the
    mzXML file, which is stored in the list 'peaks'.

    The m/z values are assumed to be ordered without validating this
    assumption.

    Note: This function uses multi-processing
    """
    processors = mp.cpu_count()
    with contextlib.closing(mp.Pool(processes=processors,
                                    initializer=pool_init,
                                    initargs=(shared_arr, ))) as pool:
        chunk_size = int(len(peaks) / processors)
        map_parameters = []
        for i in range(processors):
            counter = i * chunk_size
            # WARNING: I removed -1 from (i + 1)*chunk_size, since the right
            # index is non-inclusive. 
            chunk = peaks[i*chunk_size : (i + 1)*chunk_size]
            map_parameters.append((chunk, counter))
        pool.map(decode, map_parameters)

if __name__ == '__main__':
    shared_arr = mp.Array(ctypes.c_float, (250000 * 2 * 250) + 250)
    peaks = ...
    numpy_array(shared_arr, peaks)

如果可以保證執行分配的各種進程

if (index % 2 == 0):
    data[counter][1][peak_counter][0] = float(buff2)
else:
    data[counter][1][peak_counter][1] = float(buff2)

永遠不會競爭改變相同位置的數據，那么我相信你實際上可以放棄使用鎖

with shared_arr.get_lock():

但是我不能很好地理解你的代碼以確定，所以為了安全起見，我把鎖包括在內。

Answer 2

from multiprocessing import Process, Array
import numpy as np
import time
import ctypes

def fun(a):
    a[0] = -a[0]
    while 1:
        time.sleep(2)
        #print bytearray(a.get_obj())
        c=np.frombuffer(a.get_obj(),dtype=np.float32)
        c.shape=3,3
        print 'haha',c


def main():
    a = np.random.rand(3,3).astype(np.float32)
    a.shape=1*a.size
    #a=np.array([[1,3,4],[4,5,6]])
    #b=bytearray(a)
    h=Array(ctypes.c_float,a)
    print "Originally,",h

    # Create, start, and finish the child process
    p = Process(target=fun, args=(h,))
    p.start()
    #p.join()
    a.shape=3,3
    # Print out the changed values
    print 'first',a
    time.sleep(3)
    #h[0]=h[0]+1
    print 'main',np.frombuffer(h.get_obj(), dtype=np.float32)



if __name__=="__main__":
    main()

使我的NumPy數組在進程間共享

問題描述

2 個解決方案

解決方案1
10 已采納 2013-04-12 17:59:16

解決方案2
0 2015-04-20 09:42:11

使我的NumPy數組在進程間共享

問題描述

2 個解決方案

解決方案1 10 已采納 2013-04-12 17:59:16

解決方案2 0 2015-04-20 09:42:11

解決方案1
10 已采納 2013-04-12 17:59:16

解決方案2
0 2015-04-20 09:42:11