How to save/extract dataset from hdf5 and convert into TiFF?

Question

I am trying to import CT scan data into ImageJ/FIJI (There is HDF5 plugin in ImageJ/Fiji, however the synchrotron CT data has so large datasets.. so it was failed to open). The scan data (Image dataset) is saved as dataset into the hdf5 file. So I have to extract image dataset from the hdf5 file, then converted it into the Tiff file.

HdF5 File path is "F:/New_ESRF/SNT_BTO4/SNT_BTO4_S1/SNT_BTO4_S1_1_1pag_db0005_vol.hdf5" Herein, 'SNT_BTO4_S1_1_1pag_db0005_vol.hdf5' is divided into several datasets, and the image dataset is in here: /entry0000/reconstruction/results/data

At the moment, I accessed to the image dataset using h5py. However, after that, I am in stuck to extract/save the dataset separately from the hdf5 file.

Which code is required to extract the image dataset from the hdf5 file?
After that, I am thinking of using from PIL to Image then convert the image into Tiff file. Can I get any advice on the code for this?

import numpy as np
import h5py
filename = "F:/New_ESRF/SNT_BTO4/SNT_BTO4_S1/SNT_BTO4_S1_1_1pag_db0005_vol.hdf5"

with h5py.File(filename,'r') as hdf:
base_items = list (hdf.items())
print('#Items in the base directory:', base_items)
    
#entry0000
G1 = hdf.get ('entry0000')
G1_items = list (G1.items())
print('#Items in entry0000', G1_items)
    
#reconstruction 
G11 = G1.get ('/entry0000/reconstruction')
G11_items = list (G11.items())
print('#Items in reconstruction', G11_items)
    
#results_data
G12 = G11.get ('/entry0000/reconstruction/results')
G12_items = list (G12.items())
print('#Items in results', G12_items)

Answer 1

Extracting image data from an HDF5 file and converting to an image is a "relatively straight forward" 2 step process:

Access the data in the HDF5 file
Convert to an image with cv2 (or PIL)

A simple example is available here: How to extract individual JPEG images from a HDF5 file .

You can apply the same process to your file. Here is some pseudo-code. It's not complete because you don't show the shape of the image dataset (and the shape affects how to read the data). Also, you didn't say how many images are in dataset /entry0000/reconstruction/results/data --- does it have a single image or multiple images. If multiple images, which axis is the image counter?

import h5py
import cv2 ## for image conversion

filename = "F:/New_ESRF/SNT_BTO4/SNT_BTO4_S1/SNT_BTO4_S1_1_1pag_db0005_vol.hdf5"

with h5py.File(filename,'r') as hdf:     
    # get image dataset
    img_ds = hdf['/entry0000/reconstruction/results/data'] 
    print(f'Image Dataset info: Shape={img_ds.shape},Dtype={img_ds.dtype}')
    ## following depends on dataset shape/schema 
    ## code below assumes images are along axis=0
    for i in range(img_ds.shape[0]):
        cv2.imwrite(f'test_img_{i:03}.tiff',img_ds[i,:]) # uses slice notation
        # alternately load to a numpy array first
        img_arr = img_ds[i,:]   # slice notation gets [i,:,:,:]
        cv2.imwrite(f'test_img_{i:03}.tiff',img_arr)

Note: you don't need to use .get() to get a dataset. You can simply reference the dataset path. Also, when you use a group object, use the relative path from the dataset to the group, not the absolute path. (You should modify your code to reflect these changes.) For example, the following are equivalent

G1 = hdf['entry0000']  
## is the same as     G1 = hdf.get('entry0000')
G11 = hdf['entry0000/reconstruction']  
## is the same as     G11 = hdf.get('entry0000/reconstruction')
## OR referencing G1 group object:
G11 = G1['reconstruction']
## is the same as     G11 = G1.get('reconstruction')

How to save/extract dataset from hdf5 and convert into TiFF?

Question

1 answers

solution1
0 2021-12-12 16:51:08

How to save/extract dataset from hdf5 and convert into TiFF?

Question

1 answers

solution1 0 2021-12-12 16:51:08

solution1
0 2021-12-12 16:51:08