简体   繁体   中英

Umlauts in hdf5 files using python

I want to store strings in hdf5 files using python hdf5.py, which is working perfectly, als long as there are no umlauts or other special characters in the Unicode string:

# -*- coding: utf-8 -*-

import h5py

dtype = h5py.special_dtype(vlen=unicode)
wdata = u"Ärger"

with h5py.File("test.h5", 'w') as f:
    dset = f.create_dataset("DS1", (1,), dtype=dtype)
    dset[...] = wdata


with h5py.File("test.h5") as f:
    rdata = f["DS1"].value
print rdata    

Instead of Ärger the answer is u'\\xc4rger'

Is it possible to store umlauts in hdf5 files? How?

You need to set an encoding for your data that will work for hdf5 (and presumably keep track of which encoding you're using so that you can recover the data correctly later). Essentially, an encoding will serialize characters that are out of ascii-range into things that look like escape sequences - which can later be turned back into text that is readable in your terminal or elsewhere.

Just because you're using au"" string in Python doesn't mean that the string is encoded in a particular way that will work for this situation.

hdf5 docs on using unicode

Thank you for your help, the following code works, the Problem apparently was that the dataset is an Array, and the correct element was not chosen:

# -*- coding: utf-8 -*-

import h5py

dtype = h5py.special_dtype(vlen=unicode)
wdata = u"umlauts, in HDF5, for example öüßÄ might cause trouble"

print wdata



with h5py.File("test.h5", 'w') as f:
    dset = f.create_dataset("DS1", (1,), dtype=dtype)
    dset[...] = wdata


with h5py.File("test.h5") as f:
    rdata = f["DS1"].value[-1]

print rdata

Greetings

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM