如何在没有 concat、stack 或 append 的情况下组合两个巨大的 numpy 数组？

Question

我有两个巨大的 numpy 数组。 每个数组的形状为(7, 960000, 200) 。 我想使用np.concatenate((arr1, arr2), axis=1)连接它们，以便最终形状为(7, 1920000, 200) 。 问题是，他们已经填满了我的 ram，并且 ram 中没有足够的空间来进行串联操作，因此，执行被终止了。 np.stack也是如此。 于是，我想到了做一个新的数组，依次指向这两个数组，这个新的数组应该和合并数组的效果一样； 它们也应该是连续的。

那么，该怎么做呢？ 而且，有没有比我建议的想法更好的方法来组合它们？

Answer 1

Numpy numpy.memmap()允许创建以二进制形式存储在磁盘上的内存映射数据，可以像单个数组一样访问和接口。 此解决方案将您正在使用的单个数组保存为单独的 .npy 文件，然后将它们组合成一个二进制文件。

import numpy as np
import os

size = (7,960000,200)

# We are assuming arrays a and b share the same shape, if they do not 
# see https://stackoverflow.com/questions/50746704/how-to-merge-very-large-numpy-arrays
# for an explanation on how to create the new shape

a = np.ones(size) # uses ~16 GB RAM
a = np.transpose(a, (1,0,2))
shape = a.shape
shape[0] *= 2
dtype = a.dtype

np.save('a.npy', a)
a = None # allows for data to be deallocated by garbage collector

b = np.ones(size) # uses ~16 GB RAM
b = np.transpose(b, (1,0,2))
np.save('b.npy', a)
b = None

# Once the size is know create memmap and write chunks
data_files = ['a.npy', 'b.npy']
merged = np.memmap('merged.dat', dtype=dtype, mode='w+', shape=shape)
i = 0
for file in data_files:
    chunk = np.load(file, allow_pickle=True)
    merged[i:i+len(chunk)] = chunk
    i += len(chunk)

merged = np.transpose(merged, (1,0,2))

# Delete temporary numpy .npy files
os.remove('a.npy')
os.remove('b.npy')

基于： this stackoverflow answer
还可以在这里查看hdf5并合并两个 hdf5 文件。 这是存储大型数据集的另一种好方法

如何在没有 concat、stack 或 append 的情况下组合两个巨大的 numpy 数组？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-24 18:49:57

如何在没有 concat、stack 或 append 的情况下组合两个巨大的 numpy 数组？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-24 18:49:57

解决方案1
1 已采纳 2022-05-24 18:49:57