[英]Making my NumPy array shared across processes
我已經閱讀了很多有關共享數組的問題,對於簡單的數組來說它似乎很簡單,但我仍然試圖讓它適用於我擁有的數組。
import numpy as np
data=np.zeros(250,dtype='float32, (250000,2)float32')
我試圖通過嘗試以某種方式使mp.Array
接受data
,嘗試將其轉換為共享數組,我也嘗試使用ctypes創建數組:
import multiprocessing as mp
data=mp.Array('c_float, (250000)c_float',250)
我設法使代碼工作的唯一方法是不將數據傳遞給函數,而是將編碼的字符串傳遞給未壓縮/解碼,但最終會調用n(字符串數)進程,這似乎是多余的。 我希望的實現基於將二進制字符串列表切換為x(進程數)並將此塊, data
和index
傳遞給除了在本地修改data
之外的進程,因此關於如何使其共享的問題 ,使用自定義(嵌套)numpy數組的任何示例都將是一個很好的幫助。
PS:這個問題是Python多處理的后續問題
請注意,您可以從一個復雜的dtype數組開始:
In [4]: data = np.zeros(250,dtype='float32, (250000,2)float32')
並將其視為同源dtype數組:
In [5]: data2 = data.view('float32')
然后,將其轉換回復雜的dtype:
In [7]: data3 = data2.view('float32, (250000,2)float32')
改變dtype是一種非常快速的操作; 它不會影響基礎數據,只會影響NumPy解釋它的方式。 所以改變dtype幾乎是無成本的。
因此,您所讀到的關於具有簡單(同質)dtypes的數組的內容可以通過上述技巧輕松應用於復雜的dtype。
下面的代碼借用了JF Sebastian的答案中的許多想法。
import numpy as np
import multiprocessing as mp
import contextlib
import ctypes
import struct
import base64
def decode(arg):
chunk, counter = arg
print len(chunk), counter
for x in chunk:
peak_counter = 0
data_buff = base64.b64decode(x)
buff_size = len(data_buff) / 4
unpack_format = ">%dL" % buff_size
index = 0
for y in struct.unpack(unpack_format, data_buff):
buff1 = struct.pack("I", y)
buff2 = struct.unpack("f", buff1)[0]
with shared_arr.get_lock():
data = tonumpyarray(shared_arr).view(
[('f0', '<f4'), ('f1', '<f4', (250000, 2))])
if (index % 2 == 0):
data[counter][1][peak_counter][0] = float(buff2)
else:
data[counter][1][peak_counter][1] = float(buff2)
peak_counter += 1
index += 1
counter += 1
def pool_init(shared_arr_):
global shared_arr
shared_arr = shared_arr_ # must be inherited, not passed as an argument
def tonumpyarray(mp_arr):
return np.frombuffer(mp_arr.get_obj())
def numpy_array(shared_arr, peaks):
"""Fills the NumPy array 'data' with m/z-intensity values acquired
from b64 decoding and unpacking the binary string read from the
mzXML file, which is stored in the list 'peaks'.
The m/z values are assumed to be ordered without validating this
assumption.
Note: This function uses multi-processing
"""
processors = mp.cpu_count()
with contextlib.closing(mp.Pool(processes=processors,
initializer=pool_init,
initargs=(shared_arr, ))) as pool:
chunk_size = int(len(peaks) / processors)
map_parameters = []
for i in range(processors):
counter = i * chunk_size
# WARNING: I removed -1 from (i + 1)*chunk_size, since the right
# index is non-inclusive.
chunk = peaks[i*chunk_size : (i + 1)*chunk_size]
map_parameters.append((chunk, counter))
pool.map(decode, map_parameters)
if __name__ == '__main__':
shared_arr = mp.Array(ctypes.c_float, (250000 * 2 * 250) + 250)
peaks = ...
numpy_array(shared_arr, peaks)
如果可以保證執行分配的各種進程
if (index % 2 == 0):
data[counter][1][peak_counter][0] = float(buff2)
else:
data[counter][1][peak_counter][1] = float(buff2)
永遠不會競爭改變相同位置的數據,那么我相信你實際上可以放棄使用鎖
with shared_arr.get_lock():
但是我不能很好地理解你的代碼以確定,所以為了安全起見,我把鎖包括在內。
from multiprocessing import Process, Array
import numpy as np
import time
import ctypes
def fun(a):
a[0] = -a[0]
while 1:
time.sleep(2)
#print bytearray(a.get_obj())
c=np.frombuffer(a.get_obj(),dtype=np.float32)
c.shape=3,3
print 'haha',c
def main():
a = np.random.rand(3,3).astype(np.float32)
a.shape=1*a.size
#a=np.array([[1,3,4],[4,5,6]])
#b=bytearray(a)
h=Array(ctypes.c_float,a)
print "Originally,",h
# Create, start, and finish the child process
p = Process(target=fun, args=(h,))
p.start()
#p.join()
a.shape=3,3
# Print out the changed values
print 'first',a
time.sleep(3)
#h[0]=h[0]+1
print 'main',np.frombuffer(h.get_obj(), dtype=np.float32)
if __name__=="__main__":
main()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.