來自緩沖區的Numpy 2D-陣列？

Question

我有一個內存映射，其中包含一個2D數組，我想從中創建一個numpy數組。 理想情況下，我要避免復制，因為涉及的數組可能很大。

我的代碼如下所示：

n_bytes = 10000
tagname = "Some Tag from external System"
map = mmap.mmap(-1, n_bytes, tagname)
offsets = [0, 5000]

columns = []
for offset in offsets:
   #type and count vary in the real code, but for this dummy code I simply made them up. But I know the count and type for every column.
   np_type = np.dtype('f4')
   column_data = np.frombuffer(map, np_type, count=500, offset=offset)
   columns.append(column_data)

# this line seems to copy the data, which I would like to avoid
data = np.array(columns).T

Answer 1

我沒有使用frombuffer太多，但是我認為np.array可以像常規構造的那樣使用那些數組。

每個column_data數組將具有其自己的數據緩沖區-您為其分配的mmap。 但是np.array(columns)從列表中的每個數組讀取值，並使用它們自己的數據緩沖區從中構造一個新數組。

我喜歡使用x.__array_interface__來查看數據緩沖區的位置（並查看其他關鍵屬性）。 比較該字典的每個columns元素和data 。

您可以使用連續塊從mmap構造2d數組。 只是使1D frombuffer陣列，並reshape它。 即使transpose也將繼續使用該緩沖區（以F順序）。 切片和視圖也使用它。

但是，除非您非常謹慎，否則您將很快獲得將數據放置到其他地方的副本。 只需data1 = data+1創建一個新數組，或提前索引data[[1,3,5],:] 。 對於任何concatenation相同。

來自字節串緩沖區的2個數組：

In [534]: x=np.frombuffer(b'abcdef',np.uint8)
In [535]: y=np.frombuffer(b'ghijkl',np.uint8)

通過加入他們一個新的數組

In [536]: z=np.array((x,y))

In [538]: x.__array_interface__
Out[538]: 
{'data': (3013090040, True),
 'descr': [('', '|u1')],
 'shape': (6,),
 'strides': None,
 'typestr': '|u1',
 'version': 3}
In [539]: y.__array_interface__['data']
Out[539]: (3013089608, True)
In [540]: z.__array_interface__['data']
Out[540]: (180817384, False)

x,y,z的數據緩沖區位置完全不同

但是重塑的x的數據不會改變

In [541]: x.reshape(2,3).__array_interface__['data']
Out[541]: (3013090040, True)

2D也不會轉置

In [542]: x.reshape(2,3).T.__array_interface__
Out[542]: 
{'data': (3013090040, True),
 'descr': [('', '|u1')],
 'shape': (3, 2),
 'strides': (1, 3),
 'typestr': '|u1',
 'version': 3}

相同的數據，不同的視圖

In [544]: x
Out[544]: array([ 97,  98,  99, 100, 101, 102], dtype=uint8)
In [545]: x.reshape(2,3).T
Out[545]: 
array([[ 97, 100],
       [ 98, 101],
       [ 99, 102]], dtype=uint8)
In [546]: x.reshape(2,3).T.view('S1')
Out[546]: 
array([[b'a', b'd'],
       [b'b', b'e'],
       [b'c', b'f']], 
      dtype='|S1')

Answer 2

假設您有一個字節數組，並且知道它的尺寸，則答案非常簡單。 假設您在緩沖區（名為“ buff”）中圖像的原始RGB數據（每像素24位）的尺寸為1024x768

#read the buffer into 1D byte array
arr = numpy.frombuffer(buff, dtype=numpy.uint8)
#now shape the array as you please
arr.shape = (768,1024,3)

來自緩沖區的Numpy 2D-陣列？

問題描述

2 個解決方案

解決方案1
1 2016-08-23 05:48:19

解決方案2
1 2018-04-04 14:03:40

來自緩沖區的Numpy 2D-陣列？

問題描述

2 個解決方案

解決方案1 1 2016-08-23 05:48:19

解決方案2 1 2018-04-04 14:03:40

解決方案1
1 2016-08-23 05:48:19

解決方案2
1 2018-04-04 14:03:40