简体   繁体   中英

How do I increase the dimension size of a dataset in a HDF5 file using H5PY?

在此处输入图像描述

The Current dimension size is set to 32 characters. Is there any way to increase this using H5PY?

I am having a problem where the values in my datasets are getting cut off because they are too long. 在此处输入图像描述

To understand what you see in HDFView, an explanation of the HDF5 schema is in order. In your figure above, " Data Type: Compound " means this data set is heterogeneous data and " Dimension Size: 32 " means there 32 rows of data. It DOES NOT tell you the type of each field (column) or the allocated size of any string fields. There are 2 ways to get this info:

  1. Scroll down the General Object Info panel to the section titled Compound Dataset Members . It will show each field's datatype and string length (when appropriate). Snapshot from an example file I created shown below.
    示例数据集的 HDFView 图像
  2. You can also get it programmatically from the .dtype attribute on the dataset. There is a code snippet below that shows how to do that (for a file named 'SO_74404059.h5' with a dataset named 'Example'.)

Now, on to your question about the string size. First, check if the strings are being truncated, or just appear that way in HDFView. Again, there are 2 ways to do this:

  1. In HDFView, you can use the mouse to drag the column separators to modify the width. This image shows how I modified my view:
    HDFView修改列宽视图示例
  2. My code example also shows how to print the contents of the file. (Notice how the strings are byte strings and not Unicode, eg, b'text' . You will have to convert if/when you read them. That is a different topic answered in another SO Q&A.)

Finally, to answer your question (about the path name getting truncated). There is not a way to modify an existing dataset to increase the field size if the allocated string length too short. You have to created a new dataset with a dtype that defines string fields that are long enough for your names. It's hard to provide specific info without more details about how this file was created.

Code below:

with h5py.File('SO_74404059.h5', 'r') as h5f:
    # print field names and datatypes
    print(h5f['Example'].dtype)
    # print data in row[0]
    print(h5f['Example'][0])   
    # print data in field['Path']
    print(h5f['Example']['Path'])   

### dtype output is:
[('ID', '<i4'), ('Name', 'S16'), ('Path', 'S32'), ('Type', 'S8')]

### row[0] output is:
(0, b'Art,Diag', b'/data/cns/path1', b'cns') 

### field['Path'] output is:
[b'/data/cns/path1' b'/data/cns/path2' b'/data/cns/path3'
 b'/data/cns/path4' b'/data/cns/path5']

No, you cannot.

The easiest way is to use HDF Product Designer GUI tool .

  1. Import your current HDF5 file.
  2. Modify your HDF5 design.
  3. Get h5py code for the new design.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM