numpy.save 以存储 3D Numpy 阵列与 ZD304BA20E96D87414Z588EEABAC850E3

Question

I would like to write Numpy arrays with shape (3, 225, 400) into a binary file.我想将形状为 (3, 225, 400) 的 Numpy arrays 写入二进制文件。

These arrays are basically generated by using a screen buffer, and each screen has a label.这些arrays基本上都是使用屏幕缓冲区生成的，每个屏幕都有一个label。 My goal is to save each screen with its label.我的目标是用 label 保存每个屏幕。

numpy.save receives only two arguments: file pointer and array to be saved. numpy.save 只接收两个 arguments：要保存的文件指针和数组。 The only option seems to be appending labels to arrays as follows:唯一的选择似乎是将标签附加到 arrays 如下：

with open(file, 'wb') as f:
   np.save(f, np.append(buffer, [label]) )

However, I would not prefer this.但是，我不喜欢这个。 Another approach might be saving only the array and then writing " \t label " like regular binary writing:另一种方法可能是只保存数组，然后像常规二进制写入一样写入“\t label”：

with open(file, 'wb') as f:
   np.save(f, buffer)
   f.write("\t" + label)

I am not sure whether np.save moves the file pointer to new line after saving.我不确定 np.save 是否在保存后将文件指针移动到新行。

Considering the fact that I will save hundreds of thousands of array-label pairs in a high frequency, what would you suggest in terms of efficiency?考虑到我将高频保存数十万个数组标签对这一事实，您在效率方面有何建议？

Answer 1

One option is to save to a numpy (NPZ) file.一种选择是保存到 numpy (NPZ) 文件。 I have included an example below.我在下面提供了一个示例。 np.savez and np.savez_compressed allow one to save multiple arrays to one file. np.savez和np.savez_compressed允许将多个 arrays 保存到一个文件中。

import numpy as np

# Create fake data.
rng = np.random.RandomState(0)
buffer = rng.normal(size=(3, 225, 400))
label = "this is the label"

# Save. Can use np.savez here instead.
np.savez_compressed("output.npz", buffer=buffer, label=label)

# Load.
npzfile = np.load("output.npz")

np.testing.assert_equal(npzfile["buffer"], buffer)
np.testing.assert_equal(npzfile["label"], label)

Another option is to use HDF5 using h5py .另一种选择是使用h5py来使用 HDF5。 The organization of an HDF5 file is similar to a filesystem (root is / and datasets can be created with names like /data/buffers/dataset1 ). HDF5 文件的组织类似于文件系统（根是/并且可以使用/data/buffers/dataset1类的名称创建数据集）。 One way of organizing the buffers and labels is to create a dataset for each buffer and set a label attribute on it.组织缓冲区和标签的一种方法是为每个缓冲区创建一个数据集，并在其上设置一个 label 属性。

import h5py
import numpy as np

# Create fake data.
rng = np.random.RandomState(0)
buffer = rng.normal(size=(3, 225, 400))
label = "this is the label"

this_dataset = "/buffers/0"

# Save to HDF5.
with h5py.File("output.h5", "w") as f:
    f.create_dataset(this_dataset, data=buffer, compression="lzf")
    f[this_dataset].attrs.create("label", label)

# Load.
with h5py.File("output.h5", "r") as f:
    loaded_buffer = f[this_dataset]
    loaded_label = f[this_dataset].attrs["label"]

numpy.save 以存储 3D Numpy 阵列与 ZD304BA20E96D87414Z588EEABAC850E3

问题描述

1 个解决方案

解决方案1
1 2020-12-10 23:39:13

numpy.save 以存储 3D Numpy 阵列与 ZD304BA20E96D87414Z588EEABAC850E3

问题描述

1 个解决方案

解决方案1 1 2020-12-10 23:39:13

解决方案1
1 2020-12-10 23:39:13