简体   繁体   English

将多个 Numpy arrays 保存到 Numpy 二进制文件(Python)

[英]Saving multiple Numpy arrays to a Numpy binary file (Python)

I want to save multiple large-sized numpy arrays to a numpy binary file to prevent my code from crashing, but it seems like it keeps getting overwritten when I add on an array.我想将多个大型 numpy arrays 保存到 numpy 二进制文件中,以防止我的代码崩溃,但是当我添加一个数组时,它似乎一直被覆盖。 The last array saved is what is set to allarrays when save.npy is opened and read.最后保存的数组是打开和读取 save.npy 时设置为 allarrays 的内容。 Here is my code:这是我的代码:

with open('save.npy', 'wb') as f:
     for num in range(500):
          array = np.random.rand(100,400)
          np.save(f, array)

with open('save.npy', 'rb') as f:
     allarrays = np.load(f)

If the file existed before, I want it to be overwritten if the code is rerun.如果该文件以前存在,我希望在重新运行代码时将其覆盖。 That's why I chose 'wb' instead of 'ab'.这就是我选择“wb”而不是“ab”的原因。

 alist =[]
 with open('save.npy', 'rb') as f: 
      alist.append(np.load(f))

When you load you have collect all loads in a list or something.加载时,您已将所有负载收集到列表或其他内容中。 load only loads one array, starting at the current file position. load只加载一个数组,从当前文件 position 开始。

You can try memory mapping to disk.您可以尝试将 memory 映射到磁盘。

# merge arrays using memory mapped file
mm = np.memmap("mmap.bin", dtype='float32', mode='w+', shape=(500,100,400))
for num in range(500):
    mm[num::] = np.random.rand(100,400)

# save final array to npy file
with open('save.npy', 'wb') as f:
    np.save(f, mm[::])

I ran into this problem as well, and solved it in not a very neat way, but perhaps it's useful for others.我也遇到了这个问题,并没有以一种非常简洁的方式解决它,但也许它对其他人有用。 It's inspired by hpaulj's approach, which is incomplete (ie, doesn't load the data).它受到 hpaulj 方法的启发,该方法不完整(即不加载数据)。 Perhaps this is not how one is supposed to solve this problem to begin with...but anyhow, read on.也许这不是一个人应该如何解决这个问题的方式......但无论如何,请继续阅读。

I had saved my data using a similar procedure as the OP,我使用与 OP 类似的程序保存了我的数据,

# Saving the data in a for-loop
with open(savefilename, 'wb') as f:
    for datafilename in list_of_datafiles:
        # Do the processing
        data_to_save = ...
        np.save( savefilename, data_to_save )

And ran into the problem that calling np.load() only loaded the last saved array, none of the rest.并遇到调用np.load()仅加载最后保存的数组的问题,没有加载 rest。 However, I knew that the data was in principle contained in the *.npy file, given the file size was growing during the saving loop.但是,我知道数据原则上包含在*.npy文件中,因为文件大小在保存循环期间不断增长。 What was required was to simply loop over the content of the numpy array while calling the load command repeatedly.所需要的是在重复调用加载命令的同时简单地循环 numpy 数组的内容。 As I didn't quite know how many files were contained in the file, I simply looped over the loading loop until it failed.由于我不太清楚文件中包含多少文件,所以我只是循环加载循环直到它失败。 It's hacky, but it works.这是hacky,但它的工作原理。

# Loading the data in a for-loop
data_to_read = []
with open(savefilename, 'r') as f:
    while True:
        try:
            data_to_read.append( np.load(f) )
        except:
            print("all data has been read!")
            break

Then you can call, eg, len(data_to_read) to see how many of the arrays are contained in it.然后您可以调用例如len(data_to_read)来查看其中包含多少 arrays。 Calling, eg, data_to_read[0] gives you the first saved array, etc.例如,调用data_to_read[0]会为您提供第一个保存的数组等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM