简体   繁体   English

使用 h5py 从 h5 文件中的同一组中读取多个数据集

[英]Read multiple datasets from same Group in h5 file using h5py

I have several groups in my h5 file: 'group1', 'group2', ... and each group has 3 different datasets: 'dataset1', 'dataset2', 'dataset3' , all of which are arrays with numerical values but the size of array is different.我的 h5 文件中有几个组: 'group1', 'group2', ...每个组都有 3 个不同的数据集: 'dataset1', 'dataset2', 'dataset3' ,所有这些都是具有数值的数组,但数组的大小不同。

My goal is to save each dataset from group to a numpy array.我的目标是将每个数据集从组保存到一个 numpy 数组。

Example:例子:

import h5py
filename = '../Results/someFileName.h5'
data = h5py.File(filename, 'r')

Now I can easily iterate over all groups with现在我可以轻松地遍历所有组

for i in range(len(data.keys())):
    group = list(data.keys())[i]

but I can't figure out how to access the datasets within the group.但我不知道如何访问组内的数据集。 So I am looking for something like MATLAB:所以我正在寻找类似 MATLAB 的东西:

hinfo = h5info(filename);
for i = 1:length(hinfo.Groups())
     datasetname = [hinfo.Groups(i).Name '/dataset1'];
     dset = h5read(fn, datasetname);

Where dset is now an array of numbers. dset现在是一个数字数组。

Is there a way I could do the same with h5py?有没有办法可以用 h5py 做同样的事情?

You are have the right idea.你有正确的想法。 But, you don't need to loop on range(len(data.keys())) .但是,您不需要循环range(len(data.keys())) Just use data.keys() ;只需使用data.keys() it generates an iterable list of object names.它生成一个可迭代的对象名称列表。 Try this:尝试这个:

import h5py
filename = '../Results/someFileName.h5'
data = h5py.File(filename, 'r')
for group in data.keys() :
    print (group)
    for dset in data[group].keys():      
        print (dset)
        ds_data = data[group][dset] # returns HDF5 dataset object
        print (ds_data)
        print (ds_data.shape, ds_data.dtype)
        arr = data[group][dset][:] # adding [:] returns a numpy array
        print (arr.shape, arr.dtype)
        print (arr)

Note: logic above is valid ONLY when there are only groups at the top level (no datasets).注意:上面的逻辑仅在顶层只有组(没有数据集)时才有效。 It does not test object types as groups or data sets.它不会将对象类型作为组或数据集进行测试。

To avoid these assumptions/limitations, you should investigate .visititems() or write a generator to recursively visit objects.为避免这些假设/限制,您应该调查.visititems()或编写生成器以递归访问对象。 The first 2 answers are examples showing .visititems() usage, and the last 1 uses a generator function:前 2 个答案是显示.visititems()用法的示例,最后 1 个答案使用生成器函数:

  1. Use visititems(-function-) to loop recursively使用 visititems(-function-) 递归循环
    This example uses isinstance() as the test.此示例使用isinstance()作为测试。 The object is a Group when it tests true for h5py.Group and is a Dataset when it tests true for h5py.Dataset .该对象在为h5py.Dataset测试为 true 时是一个 Group,在它为h5py.Group测试为 true 时是一个 Dataset。 I consider this more Pythonic than the second example below (IMHO).我认为这比下面的第二个示例(恕我直言)更具 Pythonic。
  2. Convert hdf5 to raw organised in folders It checks for number of objects below the visited object.将 hdf5 转换为在文件夹中组织的 raw它检查访问对象下方的对象数量。 when there are no subgroups, it is a dataset.当没有子组时,它是一个数据集。 And when there subgroups, it is a group.当有子群时,它就是一个群。
  3. How can I combine multiple .h5 file?如何合并多个 .h5 文件? This quesion has multipel answers.这个问题有多个答案。 This answer uses a generator to merge data from several files with several groups and datasets into a single file.此答案使用生成器将来自多个文件的数据与多个组和数据集合并到一个文件中。

This method requires that dataset names, 'dataset1', 'dataset2', 'dataset3', etc., be the same in each of the hdf5 groups of one hdf5 file.此方法要求数据集名称“dataset1”、“dataset2”、“dataset3”等在一个 hdf5 文件的每个 hdf5 组中相同。

# create empty lists
lat = []
lon = []
x = []
y = []

# fill lists creating numpy arrays
h5f = h5py.File('filename.h5', 'r') # read file
for group in h5f.keys(): # iterate through groups
    for datasets in h5f[group].keys(): #iterate through datasets
        lat = np.append(lat, h5f[group]['lat'][()]) # append data
        lon = np.append(lon, h5f[group]['lon'][()])
        x = np.append(x, h5f[group]['x'][()])
        y = np.append(y, h5f[group]['y'][()])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM