简体   繁体   中英

hdf5 h5py strings in python 3: correct way to set or encode tuple with (float, string) to attributes

I understand that handling strings in hdf5 seems to be tricky - I am looking for a correct way to set attributes to a dataset where the attribute value is in the form of a tuple, (float/number/numpyarray, string).

Furthermore I need it to be the same when read back as inputted, as I then compare the dataset attributes to an ordered dictionary of desired attributes.

What is the correct way to handle this?

I have so far to set the attributes using

    def setallattributes(dataset, dictattributes):
        for key, value in dictattributes.items():
            tup0 = value[0]
            tup1 = value[1].encode('utf-8')
            value = (tup0, tup1)
            dataset.attrs[key] = value

and I am trying to check the attributes match the desired attributes using

    for datasetname in list(group.keys()):
            dataset = f[datasetname]
            if dataset.size != 0:
                saved_attributes = dataset.attrs.items() #Get (name, value) tuples for all attributes attached to this object. On Py3, it’s a collection or set-like object.
                if dict(saved_attributes) == input_attributes: #check attributes match -- both dicts, one ordered one not
                    datasetnamelist.append(datasetname)

This currently results in trying to compare things like

{'Rmax': array([b'200.0', b'ld'], dtype='|S32'), 'fracinc': array([b'0.5', b'$\\pi$'], dtype='|S32')} == OrderedDict([('Rmin', (0, 'ld')), ('Rmax',(1, 'ld')), ('fracinc',(0.5, r'$\pi$'))])

which returns False.

http://docs.h5py.org/en/stable/high/attr.html

They may be created from any scalar or NumPy array

data – Value of the attribute; will be put through numpy.array(data)

Making an array from your tuple:

In [115]: np.array((0, 'ld'))                                                   
Out[115]: array(['0', 'ld'], dtype='<U21')
In [116]: np.array((0, b'ld'))                  # for bytestring                                
Out[116]: array([b'0', b'ld'], dtype='|S21')    

Retrieving the attribute as mixed type tuple will be tricky.

Creating structured array (compound dtype) might work:

In [122]: np.array((0, 'ld'), dtype='i,S10')                                    
Out[122]: array((0, b'ld'), dtype=[('f0', '<i4'), ('f1', 'S10')])
In [123]: print(_)                                                              
(0, b'ld')
In [124]: __.tolist()                                                                   
Out[124]: (0, b'ld')

Saving your dictionary to a group:

In [126]: dd = dict([('Rmin', (0, 'ld')), ('Rmax',(1, 'ld')), ('fracinc',(0.5, r'$\pi$'))])     
In [131]: f = h5py.File('attrs.h5','w')                                         
In [132]: g = f.create_group('group')    
In [137]: for key in dd: 
     ...:     value = list(dd[key]) 
     ...:     value[1] = value[1].encode('utf-8') 
     ...:     value = np.array(tuple(value), dtype='int,S10') 
     ...:     g.attrs[key] = value 
     ...:                                                                       
In [138]: g.attrs                                                               
Out[138]: <Attributes of HDF5 object at 140472104481960>
In [139]: list(g.attrs.items())                                                 
Out[139]: [('Rmin', (0, b'ld')), ('Rmax', (1, b'ld')), ('fracinc', (0, b'$\\pi$'))]
In [140]: g.attrs['fracinc']                                                    
Out[140]: (0, b'$\\pi$')

This displays as a tuple, but is actually a numpy void . We need tolist() or item() to get a tuple that can be compared with another tuple:

In [142]: g.attrs['Rmin'].tolist()==(0,b'ld')                                   
Out[142]: True

Comparing this with dd['Rmin'] will require converting one string value to /from bytestring.

In [146]: def foo(value): 
     ...:     value = list(value) 
     ...:     value[1] = value[1].encode('utf-8') 
     ...:     return tuple(value) 
     ...:                                                                       
In [147]: dd1 = {key:foo(value) for key,value in dd.items()}                    
In [148]: dd1                                                                   
Out[148]: {'Rmin': (0, b'ld'), 'Rmax': (1, b'ld'), 'fracinc': (0.5, b'$\\pi$')}
In [149]: g.attrs['Rmin'].tolist()==dd1['Rmin']                                 
Out[149]: True

This doesn't match dd1 because I saved it with an int field ('fracint' has a float):

In [155]: {key:value.item() for key,value in g.attrs.items()}                   
Out[155]: {'Rmin': (0, b'ld'), 'Rmax': (1, b'ld'), 'fracinc': (0, b'$\\pi$')}

If I change the foo to int(value[0]) the dictionaries do match.

So if you need to do this kind of matching, you need pass the test case through the same kind of processing that you (and h5py ) does for the saved value.

I took a guess at your dictionary definition, and recreated your process as best I could. When you save your dictionary with tuples, the values are stored (and retrieved) as a String Array (reflected in the dtype in your output above).

You will have to deconstruct the array and convert each item to match the original data. So, this process will be specific to the saved data types -- I don't think it's possible to have generic method to extract and test tuples converted to string arrays.

Solution:

import h5py

def setallattributes(dataset, dictattributes):
    for key, value in dictattributes.items():
        tup0 = value[0]
        tup1 = value[1].encode('utf-8')
        value = (tup0, tup1)
        dataset.attrs[key] = value

with h5py.File('SO_58064282.h5', 'w') as h5f:   

    ds = h5f['/']
    input_attributes =  { 'Rmin': (0, 'ld'), 
    'Rmax': (1, 'ld'), 'fracinc': (0.5, r'$\pi$') }
    print ('Input:\n:',input_attributes)
    setallattributes (ds, input_attributes)

    saved_attributes = ds.attrs.items() 
    saved_attrs_dict = {}
    print ('Saved Attributes:')
    for item in saved_attributes:
        print (item)
        saved_attrs_dict.update( {item[0] : 
                     (float(item[1][0]), item[1][1].decode('utf-8')) })

    print ('Converted to Dict:\n:',dict(saved_attrs_dict))
    if saved_attrs_dict == input_attributes: 
    #check attributes match -- both dicts, one ordered one not
        print ('Saved = Input')
    else:
        print ('mismatched dictionaries')

    print ('Done')   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM