简体   繁体   中英

I can not save changes in HDF5 file

I want to change some parts of my HDF5 file and save it. I can apply the change that I want but somehow the changes will not save in the directory, I searched but I can not find out what is the problem

def create_or_update(hdf5_file, dataset_name, dataset_shape, dataset_type, dataset_value):
"""
Create or update dataset in HDF5 file
Parameters
----------
hdf5_file : File
    File identifier
dataset_name : str
    Name of new dataset
dataset_shape : array_like
    Shape of new dataset
dataset_type : type
    Type of dataset (np.float16, np.float32, 'S', etc...)
dataset_value : array_like
    Data to store in HDF5 file
"""
if not dataset_name in hdf5_file:
    hdf5_file.create_dataset(dataset_name, dataset_shape, dataset_type)
    hdf5_file[dataset_name][:] = dataset_value
else:
    if hdf5_file[dataset_name].shape != dataset_shape:
        del hdf5_file[dataset_name]
        hdf5_file.create_dataset(dataset_name, dataset_shape, dataset_type)
    hdf5_file[dataset_name][:] = dataset_value
hdf5_file.flush()

hdf5_file = h5py.File(fp_deepinsight, mode='a')

create_or_update(hdf5_file= hdf5_file, dataset_name = 'outputs/'+decoded_var, dataset_shape= (30521,), dataset_type=np.float32, dataset_value = output_real)

One thing I noticed -- you don't have hdf5_file.close() before you exit. Leaving the file open could be the problem. (If not, it will lead to problems in the future.) As an aside, Python's with/as file context manager is the preferred method for opening files (because it auto-magically handles the tear-down process when you exit the with/as block).

I wrote a little code snippet to exercise your function. It works for me, both for 1) creating a new dataset and 2) updating (replacing) an existing dataset. Notice I have hdf5_file.close() after creating/updating the 2 datasets.

import h5py
import numpy as np
with h5py.File('SO_72479449.h5', mode='w') as h5f:
    arr = np.random.random(1000)
    h5f.create_dataset('/outputs/var1',data=arr)
    

hdf5_file = h5py.File('SO_72479449.h5', mode='a')

output_real = arr = np.arange(2000)
create_or_update(hdf5_file= hdf5_file, dataset_name = 'outputs/var1', 
                 dataset_shape= (2000,), dataset_type=np.float32, dataset_value=output_real)
create_or_update(hdf5_file= hdf5_file, dataset_name = 'outputs/var2', 
                 dataset_shape= (2000,), dataset_type=np.float32, dataset_value=output_real+10000.)

hdf5_file.close()

Once you get your function working, here are some suggestions to simplify the code:

  1. You don't need to pass the dtype and shape. You can get those from the data. (And you don't really need them...read on for an explanation.)
  2. When you have the data, you can create the dataset and add the data in 1 statement (with the data= parameter). The dataset dtype and shape are derived from the data values.
  3. In other words, you don't need to create an empty dataset then populate with the data.
  4. You should also check the dataset dtype against the input data dtype. This could lead to problems if you change dtypes.

Modified example shown below:

import h5py
import numpy as np

def create_or_update(hdf5_file, dataset_name, dataset_value):
    """
    Create or update dataset in HDF5 file
    Parameters
    ----------
    hdf5_file : File
        File identifier
    dataset_name : str
        Name of new dataset
    dataset_value : array_like
        Data to store in HDF5 file
    """
    if not dataset_name in hdf5_file:
        hdf5_file.create_dataset(dataset_name, data=dataset_value)
    else:
        if hdf5_file[dataset_name].shape != dataset_value.shape or \
           hdf5_file[dataset_name].dtype != dataset_value.dtype: 
            del hdf5_file[dataset_name]
            hdf5_file.create_dataset(dataset_name, data=dataset_value)
        else:
            hdf5_file[dataset_name] = dataset_value
    hdf5_file.flush()


with h5py.File('SO_72479449.h5', mode='w') as h5f:
    arr = np.random.random(1000)
    h5f.create_dataset('/outputs/var1',data=arr)
    
with h5py.File('SO_72479449.h5', mode='a') as h5f:  
    output_real = np.arange(1000).astype(int)
    create_or_update(hdf5_file= h5f, dataset_name = 'outputs/var1', 
                     dataset_value=output_real)
    output_real = np.arange(2000)+10000.
    create_or_update(hdf5_file= h5f, dataset_name = 'outputs/var2', 
                     dataset_value=output_real)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM