简体   繁体   中英

How to combine two huge numpy arrays without concat, stack, or append?

I have two numpy arrays of huge size. Each array has the shape of (7, 960000, 200) . I want to concatenate them using np.concatenate((arr1, arr2), axis=1) so that the final shape would be (7, 1920000, 200) . The problem is, they already filled up my ram, and there is no enough room in the ram to do the concatenation operation, hence, the execution is killed. Same thing for the np.stack . So, I thought of making a new array which points to the two arrays in order, and this new array should have the same effect as combining the arrays; they should be contiguous as well.

So, how to do so? And, is there a better way to combining them than the idea I suggested?

Numpy numpy.memmap() allows for the creation of memory mapped data stored as a binary on disk that can be accessed and interfaced with as if it were a single array. This solution saves the individual arrays you are working with as separate .npy files and then combines them into a single binary file.

import numpy as np
import os

size = (7,960000,200)

# We are assuming arrays a and b share the same shape, if they do not 
# see https://stackoverflow.com/questions/50746704/how-to-merge-very-large-numpy-arrays
# for an explanation on how to create the new shape

a = np.ones(size) # uses ~16 GB RAM
a = np.transpose(a, (1,0,2))
shape = a.shape
shape[0] *= 2
dtype = a.dtype

np.save('a.npy', a)
a = None # allows for data to be deallocated by garbage collector

b = np.ones(size) # uses ~16 GB RAM
b = np.transpose(b, (1,0,2))
np.save('b.npy', a)
b = None

# Once the size is know create memmap and write chunks
data_files = ['a.npy', 'b.npy']
merged = np.memmap('merged.dat', dtype=dtype, mode='w+', shape=shape)
i = 0
for file in data_files:
    chunk = np.load(file, allow_pickle=True)
    merged[i:i+len(chunk)] = chunk
    i += len(chunk)

merged = np.transpose(merged, (1,0,2))

# Delete temporary numpy .npy files
os.remove('a.npy')
os.remove('b.npy')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM