I have 2 H5 files, file1.h5 and file2.h5. Some of the contents of the files are as follows:
file1:
file2:
Both files may contain other groups. I want to append the contents of group1 in file1 to the contents of group1 in file2 without overwriting the original contents of file2, so that at the end of the process, file2 has the following form:
I know the copy method of h5py can copy a group from one H5 file to another, but the code
import h5py
with h5py.File('file1.h5','r') as g:
with h5py.File('file2.h5','a') as h:
g.copy('group1',h)
will overwrite the original contents of file2, and I don't want to do that.
I know I could do the following:
import h5py
import pandas as pd
with h5py.File('file1.h5','r') as g:
keynames = g['group1'].keys()
for name in keynames:
df = pd.read_hdf('file1.h5',key = 'group1/' + name)
df.to_hdf('file2.h5',key = 'group1/' + name,mode = 'a',append = True)
Is there a simpler, more convenient way to do this, along the lines of the h5py copy method?
I don't know if this is simpler, but it is a process to copy data without over writing existing groups and datasets. It uses h5_object.visititems()
to recursively visit all objects in a group and it's subgroups. This retrieves groups and datasets one at a time. You write the "visitor function" to operate on the objects as they are found.
The bulk of my example creates 2 files with groups and datasets (to demonstrate). Focus on def visitor_func(name, node)
. That is where the work is done. I included extra print statements to show what's happening. My visitor function does the following:
name=
parameter is used to copy it to the same location/path in File2.Note that this code does NOT append data for common datasets from File 1 to File 2. For example, both files have a dataset '/group2/ds1'. I DO NOT copy that data. I need to know more about your data structure to write the code to append. There are several things to consider if you want to append data to an existing dataset in File2. For example:
maxshape=()
parameter. Resizeble datasets also need chunked storage enabled. (I think a default chunk size is set when you use maxshape
.) My example datasets highlight the challenge. All datasets are (10,10) ndarrays of floats. So, how should I append a (10,10) array in File 1 to a (10,10) array in File2? Should the result be:
All are logical and valid. The "correct answer" depends on your data schema.
Look at Methods 3a and 3b in this answer for some ideas: How can I combine multiple.h5 file?
Example code below:
import h5py
import numpy as np
def visitor_func(name, node):
print('working on name:', name, ', path=',node.parent.name)
if isinstance(node,h5py.Group):
print ('h5f1 object found:',name,'is a group')
elif isinstance(node,h5py.Dataset):
print ('h5f1 object found:',name,'is a dataset')
if h5f2.__contains__(name):
print ('Object:', name, 'also in File2. Skipping...\n')
else:
print ('Object:', name, 'NOT in File2. Copying...\n')
h5f1.copy(node,h5f2,name=name)
# Create File1 with 2 Groups with 2 Datasets in each
with h5py.File('SO_65365873_1.h5', mode='w') as h5f1:
h5f1.create_group('/group1')
arr = np.random.random((10,10))
h5f1.create_dataset('/group1/df1', data=arr)
arr = np.random.random((10,10))
h5f1.create_dataset('/group1/df2',data=arr)
h5f1.create_group('/group2')
arr = np.random.random((10,10))
h5f1.create_dataset('/group2/df1', data=arr)
arr = np.random.random((10,10))
h5f1.create_dataset('/group2/df2',data=arr)
# Create File2 with 1 Group with 2 Datasets
with h5py.File('SO_65365873_2.h5', mode='w') as h5f2:
h5f2.create_group('/group2')
arr = np.random.random((10,10))
h5f2.create_dataset('/group2/df1', data=arr)
arr = np.random.random((10,10))
h5f2.create_dataset('/group2/df3',data=arr)
# Copy data from File1 to File2 WITHOUT overwriting
with h5py.File('SO_65365873_1.h5', mode='r') as h5f1:
with h5py.File('SO_65365873_2.h5', mode='a') as h5f2:
h5f1.visititems(visitor_func)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.