I have some instrumental data which saved in hdf-5 format as multiple 2-d array along with the measuring time. As attached figures below, d1 and d2 are two independent file in which the instrument recorded in different time. They have the same data variables, and the only difference is the length of phony_dim_0
, which represet the total data points varying with measurement time.
These files need to be loaded to a specific software provided by the instrument company for obtaining meaningful results. I want to merge multiple files with Python xarray while keeping in their original format, and then loaed one merged file into the software.
Here is my attempt:
files = os.listdir("DATA_PATH")
d1 = xarray.open_dataset(files[0])
d2 = xarray.open_dataset(files[1])
## copy a new one to save the merged data array.
d0 = d1
vars_ = [c for c in d1]
for var in vars_:
d0[var].values = np.vstack([d1[var],d2[var]])
The error shows like this: replacement data must match the Variable's shape. replacement data has shape (761, 200); Variable has shape (441, 200)
replacement data must match the Variable's shape. replacement data has shape (761, 200); Variable has shape (441, 200)
I thought about two solution for this problem:
However, I still could not figure out the function to achieve that. Any comments or suggestions would be appreciated.
Supplemental information
I'm not familiar with xarray, so can't help with your code. However, you don't need xarray to copy HDF5 data; h5py is designed to work nicely with HDF5 data as NumPy arrays, and is all you need to get merge the data.
A note about Xarray. It uses different nomenclature than HDF5 and h5py. Xarray refers to the files as 'datasets', and calls the HDF5 datasets 'data variables'. HDF5/h5py nomenclature is more frequently used, so I am going to use it for the rest of my post.
There are some things to consider when merging datasets across 2 or more HDF5 files. They are:
I looked at your files. You have 8 HDF5 datasets in each file. One nice thing: the datasets are resizble. That simplifies the merge process. Also, although your datasets have a lot of attributes, they appear to be common in both files. That also simplifies the process.
The code below goes through the following steps to merge the data.
maxshape
parameters, and attribute names and values)..resize()
methodshape
and maxshape
for all datasets (for visual comparison).Code below:
import h5py
files = [ '211008_778183_m.h5', '211008_778624_m.h5', 'merged_.h5' ]
# Create the merge file:
with h5py.File('merged_.h5','w') as h5fw:
# Open first HDF5 file and copy each dataset.
# Will use maxhape and attributes from existing dataset.
with h5py.File(files[0],'r') as h5fr:
for ds in h5fr.keys():
h5fw.copy(h5fr[ds], h5fw, name=ds)
# Open second HDF5 file and copy data from each dataset.
# Resizes existing dataset as needed to hold new data.
with h5py.File(files[1],'r') as h5fr:
for ds in h5fr.keys():
ds_a0 = h5fw[ds].shape[0]
add_a0 = h5fr[ds].shape[0]
h5fw[ds].resize(ds_a0+add_a0,axis=0)
h5fw[ds][ds_a0:] = h5fr[ds][:]
for fname in files:
print(f'Working on file:{fname}')
with h5py.File(fname,'r') as h5f:
for ds, h5obj in h5f.items():
print (f'for: {ds}; axshape={h5obj.shape}, maxshape={h5obj.maxshape}')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.