简体   繁体   中英

How to concat many numpy arrays?

I am trying to concatenate many numpy arrays, I put each array in one file, In fact the problem that I have a lot of files, Memory can't support to create a big array Data_Array = np.zeros((1000000,7000)) , where I will put all my files. So, I found in this question Combining NumPy arrays that I can use np.concatenate :

file1= np.load('file1_Path.npy')
file2= np.load('file2_Path.npy')
file3= np.load('file3_Path.npy')
file4= np.load('file4_Path.npy')
dataArray=np.concatenate((file1, file2, file3, file4), axis=0)
test= dataArray.shape
print(test)
print (dataArray)
print (dataArray.shape)
plt.plot(dataArray.T)
plt.show() 

This way gives me a very good result, but now, I need to replace file1, file2, file3, file4 by the path to the folder of my files:

import matplotlib.pyplot as plt 
import numpy as np
import glob
import os, sys
fpath ="Path_To_Big_File"
npyfilespath =r'Path_To_Many_Numpy_Files'  
os.chdir(npyfilespath)
npfiles= glob.glob("*.npy")
npfiles.sort()
for i,npfile in enumerate(npfiles):
    dataArray=np.concatenate(npfile, axis=0)
np.save(fpath, all_arrays)

It gives me this error:

np.concatenate(npfile, axis=0)

ValueError: zero-dimensional arrays cannot be concatenated 

Could you please help me to make this method np.concatenate works?

If you wish to use large arrays, just use np.memmap instead of loading the data into memory. The advantage of memmap is that data is always saved to disk when necessary. For example, you can create a memory mapped array in the following way:

import numpy as np

a=np.memmap('myFile',dtype=np.int,mode='w+',shape=(1000000,8000))

You can then use 'a' as a normal numpy array. The limit is then your hard disk ! This creates a file on your hard disk that you can read later. You just change mode to 'r' and read data from the array. More info about memmap here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html

In order to fill that array from npy files of shape (1,8000), just write:

for i,npFile in enumerate(npfFiles):
  a[i,:]=np.load(npFile)
a.flush()

The flush method insures everything has been written on disk

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM