[英]Read multiple datasets from same Group in h5 file using h5py
I have several groups in my h5 file: 'group1', 'group2', ...
and each group has 3 different datasets: 'dataset1', 'dataset2', 'dataset3'
, all of which are arrays with numerical values but the size of array is different.我的 h5 文件中有几个组: 'group1', 'group2', ...
每个组都有 3 个不同的数据集: 'dataset1', 'dataset2', 'dataset3'
,所有这些都是具有数值的数组,但数组的大小不同。
My goal is to save each dataset from group to a numpy array.我的目标是将每个数据集从组保存到一个 numpy 数组。
Example:例子:
import h5py
filename = '../Results/someFileName.h5'
data = h5py.File(filename, 'r')
Now I can easily iterate over all groups with现在我可以轻松地遍历所有组
for i in range(len(data.keys())):
group = list(data.keys())[i]
but I can't figure out how to access the datasets within the group.但我不知道如何访问组内的数据集。 So I am looking for something like MATLAB:所以我正在寻找类似 MATLAB 的东西:
hinfo = h5info(filename);
for i = 1:length(hinfo.Groups())
datasetname = [hinfo.Groups(i).Name '/dataset1'];
dset = h5read(fn, datasetname);
Where dset
is now an array of numbers. dset
现在是一个数字数组。
Is there a way I could do the same with h5py?有没有办法可以用 h5py 做同样的事情?
You are have the right idea.你有正确的想法。 But, you don't need to loop on range(len(data.keys()))
.但是,您不需要循环range(len(data.keys()))
。 Just use data.keys()
;只需使用data.keys()
; it generates an iterable list of object names.它生成一个可迭代的对象名称列表。 Try this:尝试这个:
import h5py
filename = '../Results/someFileName.h5'
data = h5py.File(filename, 'r')
for group in data.keys() :
print (group)
for dset in data[group].keys():
print (dset)
ds_data = data[group][dset] # returns HDF5 dataset object
print (ds_data)
print (ds_data.shape, ds_data.dtype)
arr = data[group][dset][:] # adding [:] returns a numpy array
print (arr.shape, arr.dtype)
print (arr)
Note: logic above is valid ONLY when there are only groups at the top level (no datasets).注意:上面的逻辑仅在顶层只有组(没有数据集)时才有效。 It does not test object types as groups or data sets.它不会将对象类型作为组或数据集进行测试。
To avoid these assumptions/limitations, you should investigate .visititems()
or write a generator to recursively visit objects.为避免这些假设/限制,您应该调查.visititems()
或编写生成器以递归访问对象。 The first 2 answers are examples showing .visititems()
usage, and the last 1 uses a generator function:前 2 个答案是显示.visititems()
用法的示例,最后 1 个答案使用生成器函数:
isinstance()
as the test.此示例使用isinstance()
作为测试。 The object is a Group when it tests true for h5py.Group
and is a Dataset when it tests true for h5py.Dataset
.该对象在为h5py.Dataset
测试为 true 时是一个 Group,在它为h5py.Group
测试为 true 时是一个 Dataset。 I consider this more Pythonic than the second example below (IMHO).我认为这比下面的第二个示例(恕我直言)更具 Pythonic。This method requires that dataset names, 'dataset1', 'dataset2', 'dataset3', etc., be the same in each of the hdf5 groups of one hdf5 file.此方法要求数据集名称“dataset1”、“dataset2”、“dataset3”等在一个 hdf5 文件的每个 hdf5 组中相同。
# create empty lists
lat = []
lon = []
x = []
y = []
# fill lists creating numpy arrays
h5f = h5py.File('filename.h5', 'r') # read file
for group in h5f.keys(): # iterate through groups
for datasets in h5f[group].keys(): #iterate through datasets
lat = np.append(lat, h5f[group]['lat'][()]) # append data
lon = np.append(lon, h5f[group]['lon'][()])
x = np.append(x, h5f[group]['x'][()])
y = np.append(y, h5f[group]['y'][()])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.