简体   繁体   中英

Saving multiple Numpy arrays to a Numpy binary file (Python)

I want to save multiple large-sized numpy arrays to a numpy binary file to prevent my code from crashing, but it seems like it keeps getting overwritten when I add on an array. The last array saved is what is set to allarrays when save.npy is opened and read. Here is my code:

with open('save.npy', 'wb') as f:
     for num in range(500):
          array = np.random.rand(100,400)
          np.save(f, array)

with open('save.npy', 'rb') as f:
     allarrays = np.load(f)

If the file existed before, I want it to be overwritten if the code is rerun. That's why I chose 'wb' instead of 'ab'.

 alist =[]
 with open('save.npy', 'rb') as f: 
      alist.append(np.load(f))

When you load you have collect all loads in a list or something. load only loads one array, starting at the current file position.

You can try memory mapping to disk.

# merge arrays using memory mapped file
mm = np.memmap("mmap.bin", dtype='float32', mode='w+', shape=(500,100,400))
for num in range(500):
    mm[num::] = np.random.rand(100,400)

# save final array to npy file
with open('save.npy', 'wb') as f:
    np.save(f, mm[::])

I ran into this problem as well, and solved it in not a very neat way, but perhaps it's useful for others. It's inspired by hpaulj's approach, which is incomplete (ie, doesn't load the data). Perhaps this is not how one is supposed to solve this problem to begin with...but anyhow, read on.

I had saved my data using a similar procedure as the OP,

# Saving the data in a for-loop
with open(savefilename, 'wb') as f:
    for datafilename in list_of_datafiles:
        # Do the processing
        data_to_save = ...
        np.save( savefilename, data_to_save )

And ran into the problem that calling np.load() only loaded the last saved array, none of the rest. However, I knew that the data was in principle contained in the *.npy file, given the file size was growing during the saving loop. What was required was to simply loop over the content of the numpy array while calling the load command repeatedly. As I didn't quite know how many files were contained in the file, I simply looped over the loading loop until it failed. It's hacky, but it works.

# Loading the data in a for-loop
data_to_read = []
with open(savefilename, 'r') as f:
    while True:
        try:
            data_to_read.append( np.load(f) )
        except:
            print("all data has been read!")
            break

Then you can call, eg, len(data_to_read) to see how many of the arrays are contained in it. Calling, eg, data_to_read[0] gives you the first saved array, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM