简体   繁体   中英

Combine datasets of multiple HDF5 files into a virtual dataset

I have several HDF5 files that contain the same two datasets each, data and labels . These datasets are multidimensional arrays and the first is dimension is the same for both.

I would like to combine the HDF5 files into one file and I think the best way would be to create a virtual dataset, [ h5py reference ], [ HDF5 tutorial in C++ ]. However, I have not found any example in Python and h5py.

Is there any alternative to the virtual dataset or do you know of any example using h5py?

take gdal virtual format for a try.

Someone has tried it. Example is here, but unfortunately I was not able to get it to work and also it seems to be syntactically incorrect. https://github.com/aaron-parsons/h5py/blob/1e467f6db3df23688e90f44bde7558bde7173a5b/docs/vds.rst#using-the-vds-feature-from-h5py

f = h5py.File("VDS.h5", 'w', libver='latest')
file_names_to_concatenate = ['1.h5', '2.h5', '3.h5', '4.h5', '5.h5']
entry_key = 'data' # where the data is inside of the source files.
sh = h5.File(file_names_to_concatenate[0],'r')[entry_key].shape # get the first ones shape.

TGT = h5.VirtualTarget(outfile, outkey, shape=(len(file_names_to_concatenate, ) + sh)

for i in range(num_projections):
    VSRC = h5.VirtualSource(file_names_to_concatenate[i]), entry_key, shape=sh)
    VM = h5.VirtualMap(VSRC[:,:,:], TGT[i:(i+1):1,:,:,:],dtype=np.float)
    VMlist.append(VM)

d = f.create_virtual_dataset(VMlist=VMlist,fillvalue=0)
f.close()

This is an old question but anyway...

Virtual datasets have only just appeared (20 Dec 2018) fully in h5py v2.9

They have this example of creating a virtual dataset: https://github.com/h5py/h5py/blob/master/examples/vds_simple.py

I also did some experimenting to concatenate the data sets that the example creates. This just creates a 1D array.

import h5py
import numpy as np

file_names_to_concatenate = ['1.h5', '2.h5', '3.h5', '4.h5']
entry_key = 'data' # where the data is inside of the source files.

sources = []
total_length = 0
for i, filename in enumerate(file_names_to_concatenate):
    with h5py.File(file_names_to_concatenate[i], 'r') as activeData:
        vsource = h5py.VirtualSource(activeData[entry_key])
        total_length += vsource.shape[0]
        sources.append(vsource)

layout = h5py.VirtualLayout(shape=(total_length,),
                            dtype=np.float)

offset = 0
for vsource in sources:
    length = vsource.shape[0]
    layout[offset : offset + length] = vsource
    offset += length

with h5py.File("VDS_con.h5", 'w', libver='latest') as f:
    f.create_virtual_dataset(entry_key, layout, fillvalue=0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM