Umlauts in hdf5 files using python

Question

I want to store strings in hdf5 files using python hdf5.py, which is working perfectly, als long as there are no umlauts or other special characters in the Unicode string:

# -*- coding: utf-8 -*-

import h5py

dtype = h5py.special_dtype(vlen=unicode)
wdata = u"Ärger"

with h5py.File("test.h5", 'w') as f:
    dset = f.create_dataset("DS1", (1,), dtype=dtype)
    dset[...] = wdata


with h5py.File("test.h5") as f:
    rdata = f["DS1"].value
print rdata

Instead of Ärger the answer is u'\\xc4rger'

Is it possible to store umlauts in hdf5 files? How?

Answer 1

You need to set an encoding for your data that will work for hdf5 (and presumably keep track of which encoding you're using so that you can recover the data correctly later). Essentially, an encoding will serialize characters that are out of ascii-range into things that look like escape sequences - which can later be turned back into text that is readable in your terminal or elsewhere.

Just because you're using au"" string in Python doesn't mean that the string is encoded in a particular way that will work for this situation.

hdf5 docs on using unicode

Answer 2

Thank you for your help, the following code works, the Problem apparently was that the dataset is an Array, and the correct element was not chosen:

# -*- coding: utf-8 -*-

import h5py

dtype = h5py.special_dtype(vlen=unicode)
wdata = u"umlauts, in HDF5, for example öüßÄ might cause trouble"

print wdata



with h5py.File("test.h5", 'w') as f:
    dset = f.create_dataset("DS1", (1,), dtype=dtype)
    dset[...] = wdata


with h5py.File("test.h5") as f:
    rdata = f["DS1"].value[-1]

print rdata

Greetings

Umlauts in hdf5 files using python

Question

2 answers

solution1
0 ACCPTED 2015-07-29 14:06:32

solution2
0 2015-07-31 14:26:42

Umlauts in hdf5 files using python

Question

2 answers

solution1 0 ACCPTED 2015-07-29 14:06:32

solution2 0 2015-07-31 14:26:42

solution1
0 ACCPTED 2015-07-29 14:06:32

solution2
0 2015-07-31 14:26:42