简体   繁体   English

创建用于存储数据集的 h5 文件以训练超分辨率 GAN

[英]Creating h5 file for storing a dataset to train super resolution GAN

I am trying to create a h5 file for storing a dataset for training a super resolution GAN.我正在尝试创建一个 h5 文件来存储用于训练超分辨率 GAN 的数据集。 Where each training pair would be a Low resolution and a High resolution image.每个训练对将是低分辨率和高分辨率图像。 The dataset will contain the data in the following manner: [[LR1,HR1],[LR2,HR2],...[LRn,HRn]].数据集将以下列方式包含数据:[[LR1,HR1],[LR2,HR2],...[LRn,HRn]]。 I have 256x256 RGB images for HR and 128x128 RGB for LR.我有 256x256 RGB 图像用于 HR 和 128x128 RGB 用于 LR。 I am a bit skeptical about the best way to store this in a h5 file and shall I scale the images by 255 before storing them in the h5 file?我对将其存储在 h5 文件中的最佳方式持怀疑态度,我是否应该将图像缩放 255 倍,然后再将它们存储在 h5 文件中?

I have wrote the following code to do so.为此,我编写了以下代码。 Any help/suggestions would be highly appreciated.任何帮助/建议将不胜感激。

import h5py
import numpy as np
import os
import cv2
import glob



def store_super_resolution_dataset_in_h5_file(path_to_LR,path_to_HR):
    '''This function takes the files with the same name from LR and HR folders and stores the new dataset in h5 format'''
    #create LR and HR image lists
    LR_images = glob.glob(path_to_LR+'*.jpg')
    HR_images = glob.glob(path_to_HR+'*.jpg')
    #sort the lists
    LR_images.sort()
    HR_images.sort()
    print('LR_images: ',LR_images)
    print('HR_images: ',HR_images)
    #create a h5 file
    h5_file = h5py.File('super_resolution_dataset.h5','w')
    #create a dataset in the h5 file
    dataset = h5_file.create_dataset('super_resolution_dataset',(len(LR_images),2,256,256),dtype='f')
    #store the images in the dataset
    for i in range(len(LR_images)):
        LR_image = cv2.imread(LR_images[i])
        HR_image = cv2.imread(HR_images[i])
        dataset[i,0,:,:] = LR_image
        dataset[i,1,:,:] = HR_image
    #close the h5 file
    h5_file.close()

There are 2 code segments below.下面有2个代码段。 The first code segment shows my recommended method: loading Hi-Res and Low-Res images to separate datasets to reduce the HDF5 file size.第一个代码段显示了我推荐的方法:将高分辨率和低分辨率图像加载到单独的数据集以减小 HDF5 文件大小。 The second simply corrects errors in your code (modified to use the with/as: context manager).第二个只是更正代码中的错误(修改为使用with/as:上下文管理器)。 Both code segments begin after the #create a h5 file comment.两个代码段都在#create a h5 file注释之后开始。

I ran a test with 43 images to compare resulting file sizes.我对 43 张图像进行了测试,以比较生成的文件大小。 Results are:结果是:

  • 1 dataset size = 66.0 MB 1 个数据集大小 = 66.0 MB
  • 2 dataset size = 41.3 MB (37% reduction) 2 数据集大小 = 41.3 MB(减少 37%)

Recommended method using 2 datasets:使用 2 个数据集的推荐方法:

#create a h5 file
with h5py.File('low_hi_resolution_dataset.h5','w') as h5_file:
    #create 2 datasets for LR and HR images in the h5 file
    lr_ds = h5_file.create_dataset('low_res_dataset',(len(LR_images),128,128,3),dtype='f')
    hr_ds = h5_file.create_dataset('hi_res_dataset',(len(LR_images),256,256,3),dtype='f')
    #store the images in the dataset
    for i in range(len(LR_images)):
        LR_image = cv2.imread(LR_images[i])
        HR_image = cv2.imread(HR_images[i])
        lr_ds[i] = LR_image
        hr_ds[i] = HR_image

Modifications to your method:修改您的方法:

#create a h5 file
with h5py.File('super_resolution_dataset.h5','w') as h5_file:
    #create a dataset in the h5 file
    dataset = h5_file.create_dataset('super_resolution_dataset',(len(LR_images),2,256,256,3),dtype='f')
    #store the images in the dataset
    for i in range(len(LR_images)):
        LR_image = cv2.imread(LR_images[i])
        HR_image = cv2.imread(HR_images[i])
        dataset[i,0,0:128,0:128,:] = LR_image
        dataset[i,1,:,:,:] = HR_image

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM