[英]numpy.save to store 3D Numpy array together with a label
I would like to write Numpy arrays with shape (3, 225, 400) into a binary file.我想将形状为 (3, 225, 400) 的 Numpy arrays 写入二进制文件。
These arrays are basically generated by using a screen buffer, and each screen has a label.这些arrays基本上都是使用屏幕缓冲区生成的,每个屏幕都有一个label。 My goal is to save each screen with its label.
我的目标是用 label 保存每个屏幕。
numpy.save receives only two arguments: file pointer and array to be saved. numpy.save 只接收两个 arguments:要保存的文件指针和数组。 The only option seems to be appending labels to arrays as follows:
唯一的选择似乎是将标签附加到 arrays 如下:
with open(file, 'wb') as f:
np.save(f, np.append(buffer, [label]) )
However, I would not prefer this.但是,我不喜欢这个。 Another approach might be saving only the array and then writing " \t label " like regular binary writing:
另一种方法可能是只保存数组,然后像常规二进制写入一样写入“\t label”:
with open(file, 'wb') as f:
np.save(f, buffer)
f.write("\t" + label)
I am not sure whether np.save moves the file pointer to new line after saving.我不确定 np.save 是否在保存后将文件指针移动到新行。
Considering the fact that I will save hundreds of thousands of array-label pairs in a high frequency, what would you suggest in terms of efficiency?考虑到我将高频保存数十万个数组标签对这一事实,您在效率方面有何建议?
One option is to save to a numpy (NPZ) file.一种选择是保存到 numpy (NPZ) 文件。 I have included an example below.
我在下面提供了一个示例。
np.savez
and np.savez_compressed
allow one to save multiple arrays to one file. np.savez
和np.savez_compressed
允许将多个 arrays 保存到一个文件中。
import numpy as np
# Create fake data.
rng = np.random.RandomState(0)
buffer = rng.normal(size=(3, 225, 400))
label = "this is the label"
# Save. Can use np.savez here instead.
np.savez_compressed("output.npz", buffer=buffer, label=label)
# Load.
npzfile = np.load("output.npz")
np.testing.assert_equal(npzfile["buffer"], buffer)
np.testing.assert_equal(npzfile["label"], label)
Another option is to use HDF5 using h5py
.另一种选择是使用
h5py
来使用 HDF5。 The organization of an HDF5 file is similar to a filesystem (root is /
and datasets can be created with names like /data/buffers/dataset1
). HDF5 文件的组织类似于文件系统(根是
/
并且可以使用/data/buffers/dataset1
类的名称创建数据集)。 One way of organizing the buffers and labels is to create a dataset for each buffer and set a label attribute on it.组织缓冲区和标签的一种方法是为每个缓冲区创建一个数据集,并在其上设置一个 label 属性。
import h5py
import numpy as np
# Create fake data.
rng = np.random.RandomState(0)
buffer = rng.normal(size=(3, 225, 400))
label = "this is the label"
this_dataset = "/buffers/0"
# Save to HDF5.
with h5py.File("output.h5", "w") as f:
f.create_dataset(this_dataset, data=buffer, compression="lzf")
f[this_dataset].attrs.create("label", label)
# Load.
with h5py.File("output.h5", "r") as f:
loaded_buffer = f[this_dataset]
loaded_label = f[this_dataset].attrs["label"]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.