简体   繁体   English

将数据集从 1 个 HDF5 文件提取到多个文件

[英]Extracting datasets from 1 HDF5 file to multiple files

I have actually raised a question in generating img from HDF5.实际上,我在从 HDF5 生成 img 时提出了一个问题。 Now, another problem I have is to generate the h5 from the existing.现在,我遇到的另一个问题是从现有的生成 h5。

For instance, I have a [ABC.h5], inside, there is the dataset for image and its gt_density map.例如,我有一个 [ABC.h5],里面有图像数据集及其 gt_density map。 The keys would be [images, density_maps]键是 [images, density_maps]

I want to have [GT_001.h5], [GT_002.h5]... instead of the single h5 file.我想要 [GT_001.h5], [GT_002.h5]... 而不是单个 h5 文件。 This is the [density_maps] extracted for each image.这是为每个图像提取的 [density_maps]。

How to achieve this?如何做到这一点? Thanks a lot.非常感谢。

[EDIT] Here is more related information. [编辑] 这是更多相关信息。 Thank you @kcw78 for the guides.谢谢@kcw78 的指导。 In the original dataset in the CRSNet, there is a single image file and its ground truth density map in h5.在CRSNet的原始数据集中,在h5中有一个单一的图像文件及其ground truth密度map。 This density map is <HDF5 dataset "density": shape (544, 932), type "<f4"> <class 'h5py._hl.dataset.Dataset'>.这个密度 map 是 <HDF5 dataset "density": shape (544, 932), type "<f4"> <class 'h5py._hl.dataset.Dataset'>。 Therefore, in this dataset, for each IMG_001.jpg, there is an according to IMG_001.h5.因此,在这个数据集中,对于每个 IMG_001.jpg,都有一个对应的 IMG_001.h5。

In the dataset I have, it is a single h5 file with the information: HDF5 dataset "density_maps": shape (300, 380, 676, 1), type "<f4"> <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "images": shape (300, 380, 676, 1), type "|u1"> <class 'h5py._hl.dataset.Dataset'>在我拥有的数据集中,它是一个包含以下信息的 h5 文件:HDF5 dataset "density_maps": shape (300, 380, 676, 1), type "<f4"> <class 'h5py._hl.dataset.Dataset' > <HDF5 数据集“图像”:形状 (300, 380, 676, 1),类型“|u1”> <class 'h5py._hl.dataset.Dataset'>

I have successfully generated the corresponding images from the file.我已经成功地从文件中生成了相应的图像。 Therefore, my current problem would be how to loop and copy the dataset to another new h5 and built a corresponding density map h5 for each image.因此,我当前的问题是如何将数据集循环并复制到另一个新的 h5 并为每个图像构建相应的密度 map h5。 To explain with a sample, how can I achieve the IMG_001.h5... from this single H5PY file用一个示例来解释,我怎样才能从这个单一的 H5PY 文件中实现 IMG_001.h5...

This answers your question based on my interpretation of your data.根据我对您的数据的解释,这回答了您的问题。 If it doesn't solve your problem, please clarify the summary below.如果它不能解决您的问题,请澄清下面的摘要。

First, please be careful with the term "dataset".首先,请注意术语“数据集”。 It has a specific meaning with h5py.它与 h5py 有特定的含义。 You use "dataset" to refer to a set of data used for training and testing a CNN.您使用“数据集”来指代用于训练和测试 CNN 的一组数据。 That makes it difficult when there are also datasets IN a HDF5 file.HDF5文件中还有数据集时,这会变得很困难。

Based on your explanation, this is my understanding of the different files you have for training and testing.根据您的解释,这是我对您用于培训和测试的不同文件的理解。

Your original set of training and testing data in the CRSNet:您在 CRSNet 中的原始训练和测试数据集:
image files : IMG_###.jpg图片文件:IMG_###.jpg
ground truth density map files : IMG_###.h5 with attributes: name="density";地面实况密度 map 文件:IMG_###.h5,属性:name="density"; shape=(544, 932);形状=(544, 932); type="<f4">类型="<f4">
You have pairs of image and density files -- 1.jpg and.h5 file for IMG_001 thru IMG_NNN.您有成对的图像和密度文件——IMG_001 到 IMG_NNN 的 1.jpg 和.h5 文件。

Your new set of training and testing data:您的新训练和测试数据集:
H5 Filename : [ABC.h5] H5 文件名:[ABC.h5]
H5 Dataset 1 : name="images": shape=(300, 380, 676, 1), type="|u1" H5 数据集 1 : name="images": shape=(300, 380, 676, 1), type="|u1"
H5 Dataset 2 : name="density_maps", shape=(300, 380, 676, 1), type="<f4"> H5 数据集 2 :name="density_maps", shape=(300, 380, 676, 1), type="<f4">

You have extracted the data from the "images" dataset in this.h5 file to create IMG_###.jpg (like your original set of training and testing data).您已从 this.h5 文件中的“图像”数据集中提取数据以创建 IMG_###.jpg(就像您的原始训练和测试数据集一样)。 Now you want to extract arrays from the "density_maps" dataset in the.h5 file to create IMG_###.h5.现在您要从 .h5 文件中的“density_maps”数据集中提取 arrays 以创建 IMG_###.h5。

If so, the process is the same as the image extraction procedure.如果是,则该过程与图像提取过程相同。 The only difference is you write the data to a.h5 file instead of.jpg file.唯一的区别是您将数据写入 a.h5 文件而不是 .jpg 文件。 See below for a pseudo-code.请参阅下面的伪代码。

with h5py.File('yourfile.h5','r') as h5r:
    for i in range(h5r['density_maps'].shape[0]):
        dmap_arr = h5r['density_maps'][i,:] 
        h5w=h5py.File(f'IMG_{i:03}.h5','w')
        h5w.create_dataset('density_maps',data=dmap_arr)
        h5w.close()
        

Note, when you read dmap_arr you may get shape=(380, 676, 1) .请注意,当您阅读dmap_arr时,您可能会得到shape=(380, 676, 1) If so, you can reshape with .reshape(380, 676) .如果是这样,您可以使用.reshape(380, 676)进行整形。 Like this:像这样:

        dmap_arr = h5r['density_maps'][i,:].reshape(380, 676)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM