简体   繁体   English

输入和输出 numpy 数组到 h5py

[英]Input and output numpy arrays to h5py

I have a Python code whose output is a我有一个 Python 代码,它的输出是在此处输入图片说明sized matrix, whose entries are all of the type float .大小的矩阵,其条目都是float类型。 If I save it with the extension .dat the file size is of the order of 500 MB.如果我使用扩展名.dat保存它,则文件大小约为 500 MB。 I read that using h5py reduces the file size considerably.我读到使用h5py可以大大减少文件大小。 So, let's say I have the 2D numpy array named A .所以,假设我有一个名为A的二维 numpy 数组。 How do I save it to an h5py file?如何将其保存到 h5py 文件? Also, how do I read the same file and put it as a numpy array in a different code, as I need to do manipulations with the array?另外,我如何读取同一个文件并将其作为 numpy 数组放在不同的代码中,因为我需要对数组进行操作?

h5py provides a model of datasets and groups . h5py 提供了数据集的模型。 The former is basically arrays and the latter you can think of as directories.前者基本上是数组,而后者你可以认为是目录。 Each is named.每一个都有名字。 You should look at the documentation for the API and examples:您应该查看 API 和示例的文档:

http://docs.h5py.org/en/latest/quick.html http://docs.h5py.org/en/latest/quick.html

A simple example where you are creating all of the data upfront and just want to save it to an hdf5 file would look something like:一个简单的示例,您预先创建所有数据并只想将其保存到 hdf5 文件中,如下所示:

In [1]: import numpy as np
In [2]: import h5py
In [3]: a = np.random.random(size=(100,20))
In [4]: h5f = h5py.File('data.h5', 'w')
In [5]: h5f.create_dataset('dataset_1', data=a)
Out[5]: <HDF5 dataset "dataset_1": shape (100, 20), type "<f8">

In [6]: h5f.close()

You can then load that data back in using: '然后,您可以使用以下命令重新加载该数据:'

In [10]: h5f = h5py.File('data.h5','r')
In [11]: b = h5f['dataset_1'][:]
In [12]: h5f.close()

In [13]: np.allclose(a,b)
Out[13]: True

Definitely check out the docs:一定要查看文档:

http://docs.h5py.org http://docs.h5py.org

Writing to hdf5 file depends either on h5py or pytables (each has a different python API that sits on top of the hdf5 file specification).写入 hdf5 文件取决于 h5py 或 pytables(每个都有不同的 python API,位于 hdf5 文件规范之上)。 You should also take a look at other simple binary formats provided by numpy natively such as np.save , np.savez etc:您还应该查看 numpy 本身提供的其他简单二进制格式,例如np.savenp.savez等:

http://docs.scipy.org/doc/numpy/reference/routines.io.html http://docs.scipy.org/doc/numpy/reference/routines.io.html

A cleaner way to handle file open/close and avoid memory leaks:一个清洁的方式来处理文件打开/关闭,避免内存泄漏:

Prep:准备:

import numpy as np
import h5py

data_to_write = np.random.random(size=(100,20)) # or some such

Write:写:

with h5py.File('name-of-file.h5', 'w') as hf:
    hf.create_dataset("name-of-dataset",  data=data_to_write)

Read:阅读:

with h5py.File('name-of-file.h5', 'r') as hf:
    data = hf['name-of-dataset'][:]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM