How to insert/edit a column in an existing HDF5 dataset

Question

I have a HDF5 file as seen below. I would like edit the index column and create a new timestamp index. Is there any way to do this?

Answer 1

This isn't possible, unless you have the scheme / specification used to create the HDF5 files in the first place.

Many things can go wrong if you attempt to use HDF5 files like a spreadsheet (even via h5py). For example:

Inconsistent chunk shape, compression, data types.
Homogeneous data becoming non-homogeneous.

What you could do is add a list as an attribute to the dataset. In fact, this is probably the right thing to do. Sample code below, with the input as a dictionary. When you read in the data, you link the attributes to the homogeneous data (by row, column, or some other identifier).

def add_attributes(hdf_file, attributes, path='/'):

    """Add or change attributes in path provided.
    Default path is root group.
    """

    assert os.path.isfile(hdf_file), "File Not Found Exception '{0}'.".format(hdf_file)
    assert isinstance(attributes, dict), "attributes argument must be a key: value dictionary: {0}".format(type(attributes))

    with h5py.File(hdf_file, 'r+') as hdf:
        for k, v in attributes.items():
            hdf[path].attrs[k] = v

    return "The following attributes have been added or updated: {0}".format(list(attributes.keys()))

How to insert/edit a column in an existing HDF5 dataset

Question

1 answers

solution1
0 2018-02-12 12:53:45

How to insert/edit a column in an existing HDF5 dataset

Question

1 answers

solution1 0 2018-02-12 12:53:45

solution1
0 2018-02-12 12:53:45