简体   繁体   English

如何在没有 concat、stack 或 append 的情况下组合两个巨大的 numpy 数组?

[英]How to combine two huge numpy arrays without concat, stack, or append?

I have two numpy arrays of huge size.我有两个巨大的 numpy 数组。 Each array has the shape of (7, 960000, 200) .每个数组的形状为(7, 960000, 200) I want to concatenate them using np.concatenate((arr1, arr2), axis=1) so that the final shape would be (7, 1920000, 200) .我想使用np.concatenate((arr1, arr2), axis=1)连接它们,以便最终形状为(7, 1920000, 200) The problem is, they already filled up my ram, and there is no enough room in the ram to do the concatenation operation, hence, the execution is killed.问题是,他们已经填满了我的 ram,并且 ram 中没有足够的空间来进行串联操作,因此,执行被终止了。 Same thing for the np.stack . np.stack也是如此。 So, I thought of making a new array which points to the two arrays in order, and this new array should have the same effect as combining the arrays;于是,我想到了做一个新的数组,依次指向这两个数组,这个新的数组应该和合并数组的效果一样; they should be contiguous as well.它们也应该是连续的。

So, how to do so?那么,该怎么做呢? And, is there a better way to combining them than the idea I suggested?而且,有没有比我建议的想法更好的方法来组合它们?

Numpy numpy.memmap() allows for the creation of memory mapped data stored as a binary on disk that can be accessed and interfaced with as if it were a single array. Numpy numpy.memmap()允许创建以二进制形式存储在磁盘上的内存映射数据,可以像单个数组一样访问和接口。 This solution saves the individual arrays you are working with as separate .npy files and then combines them into a single binary file.此解决方案将您正在使用的单个数组保存为单独的 .npy 文件,然后将它们组合成一个二进制文件。

import numpy as np
import os

size = (7,960000,200)

# We are assuming arrays a and b share the same shape, if they do not 
# see https://stackoverflow.com/questions/50746704/how-to-merge-very-large-numpy-arrays
# for an explanation on how to create the new shape

a = np.ones(size) # uses ~16 GB RAM
a = np.transpose(a, (1,0,2))
shape = a.shape
shape[0] *= 2
dtype = a.dtype

np.save('a.npy', a)
a = None # allows for data to be deallocated by garbage collector

b = np.ones(size) # uses ~16 GB RAM
b = np.transpose(b, (1,0,2))
np.save('b.npy', a)
b = None

# Once the size is know create memmap and write chunks
data_files = ['a.npy', 'b.npy']
merged = np.memmap('merged.dat', dtype=dtype, mode='w+', shape=shape)
i = 0
for file in data_files:
    chunk = np.load(file, allow_pickle=True)
    merged[i:i+len(chunk)] = chunk
    i += len(chunk)

merged = np.transpose(merged, (1,0,2))

# Delete temporary numpy .npy files
os.remove('a.npy')
os.remove('b.npy')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM