How do I increase the dimension size of a dataset in a HDF5 file using H5PY?

Question

The Current dimension size is set to 32 characters. Is there any way to increase this using H5PY?

I am having a problem where the values in my datasets are getting cut off because they are too long.

Answer 1

To understand what you see in HDFView, an explanation of the HDF5 schema is in order. In your figure above, " Data Type: Compound " means this data set is heterogeneous data and " Dimension Size: 32 " means there 32 rows of data. It DOES NOT tell you the type of each field (column) or the allocated size of any string fields. There are 2 ways to get this info:

Scroll down the General Object Info panel to the section titled Compound Dataset Members . It will show each field's datatype and string length (when appropriate). Snapshot from an example file I created shown below.
You can also get it programmatically from the .dtype attribute on the dataset. There is a code snippet below that shows how to do that (for a file named 'SO_74404059.h5' with a dataset named 'Example'.)

Now, on to your question about the string size. First, check if the strings are being truncated, or just appear that way in HDFView. Again, there are 2 ways to do this:

In HDFView, you can use the mouse to drag the column separators to modify the width. This image shows how I modified my view:
My code example also shows how to print the contents of the file. (Notice how the strings are byte strings and not Unicode, eg, b'text' . You will have to convert if/when you read them. That is a different topic answered in another SO Q&A.)

Finally, to answer your question (about the path name getting truncated). There is not a way to modify an existing dataset to increase the field size if the allocated string length too short. You have to created a new dataset with a dtype that defines string fields that are long enough for your names. It's hard to provide specific info without more details about how this file was created.

Code below:

with h5py.File('SO_74404059.h5', 'r') as h5f:
    # print field names and datatypes
    print(h5f['Example'].dtype)
    # print data in row[0]
    print(h5f['Example'][0])   
    # print data in field['Path']
    print(h5f['Example']['Path'])   

### dtype output is:
[('ID', '<i4'), ('Name', 'S16'), ('Path', 'S32'), ('Type', 'S8')]

### row[0] output is:
(0, b'Art,Diag', b'/data/cns/path1', b'cns') 

### field['Path'] output is:
[b'/data/cns/path1' b'/data/cns/path2' b'/data/cns/path3'
 b'/data/cns/path4' b'/data/cns/path5']

Answer 2

No, you cannot.

The easiest way is to use HDF Product Designer GUI tool .

Import your current HDF5 file.
Modify your HDF5 design.
Get h5py code for the new design.

How do I increase the dimension size of a dataset in a HDF5 file using H5PY?

Question

2 answers

solution1
0 2022-11-11 16:37:12

solution2
0 2022-11-13 17:55:13

How do I increase the dimension size of a dataset in a HDF5 file using H5PY?

Question

2 answers

solution1 0 2022-11-11 16:37:12

solution2 0 2022-11-13 17:55:13

solution1
0 2022-11-11 16:37:12

solution2
0 2022-11-13 17:55:13