简体   繁体   English

h5py中的引用数组

[英]Reference arrays in h5py

I use h5py. 我用h5py。 I want to have a compound dataset of strings (column1) and regional_reference (column2) inside my HDF5 file. 我想在我的HDF5文件中有一个字符串(column1)和regional_reference(column2)的复合数据集。 For this, I am trying to define a numpy dtype of String and Reference. 为此,我试图定义一个numpy dtype的String和Reference。

But even before this, I am failing to define a numpy dtype array of hdf5 regional references. 但即使在此之前,我也未能定义hdf5区域引用的numpy dtype数组。

##map_h5py.py
import h5py
import numpy as np

h = h5py.File('testing_mapping.h5', 'a')
cell_names = ['cell0', 'cell1', 'cell2', 'cell3']
dummy_data = np.random.rand(4,20)

##create random data
dset = h.create_dataset('/data/Vm', data=dummy_data, dtype='float32')
#declare a data type
sp_type = np.dtype([('ref',h5py.special_dtype(ref=h5py.RegionReference))])

##this works - 1
refs_list = [] 
for ii in range(dset.shape[0]):
    refs_list.append(dset.regionref[ii])
h.create_dataset('/map/Vm_list', data=refs_list, dtype=h5py.special_dtype(ref=h5py.RegionReference))

##this doesn't - 2
ref_dset = h.create_dataset('/map/Vm_pre', shape=(dset.shape[0],), dtype=sp_type)
for ii in range(dset.shape[0]):
    ref_dset[ii] = dset.regionref[ii]

# #this doesn't - 3
ref_numpy = np.zeros(dset.shape[0], dtype=sp_type)
for ii in range(dset.shape[0]):
    ref_numpy[ii] = dset.regionref[ii]
h.create_dataset('/map/Vm_post', data=ref_numpy, dtype=sp_type)

h.close()

The error in case of 2 and 3 is the following, 2和3的情况下的错误如下,

    ref_numpy[ii] = dset.regionref[ii]
ValueError: Setting void-array with object members using buffer.

I've experienced the same problem, and created an issue at h5py (however, probably this should be sent to numpy - see below). 我遇到了同样的问题,并在h5py创建了一个问题 (但是,可能这应该发送到numpy - 见下文)。 Here I'll copy the important information from that issue to get some insight on why this happens and how to overcome it. 在这里,我将复制该问题的重要信息,以便了解为什么会发生这种情况以及如何克服它。

Here is a kinda-minimal example of the problem, which shows that the obvious way to assign doesn't work: 这是一个问题的最小例子,它表明分配的明显方法不起作用:

with h5py.File('tst.hdf5', mode='w') as f:
    ds1 = f.create_dataset('ds1', shape=(1,), dtype=np.dtype([('objfield', h5py.special_dtype(ref=h5py.Reference))]))
    ds2 = f.create_dataset('ds2', shape=(), dtype=np.int)

    # All these lines raise ValueError:
#     ds1[0, 'objfield'] = ds2.ref
#     ds1[0, 'objfield'] = (ds2.ref,)
#     ds1[0, 'objfield'] = np.array(ds2.ref)
#     ds1[0, 'objfield'] = np.array((ds2.ref,))
#     ds1[0, 'objfield'] = np.array((ds2.ref,), dtype=h5py.special_dtype(ref=h5py.Reference))
#     ds1[0, 'objfield'] = np.array(ds2.ref, dtype=np.dtype([('objfield', h5py.special_dtype(ref=h5py.Reference))]))

    # Only this one works:
    ds1[0, 'objfield'] = np.array((ds2.ref,), dtype=np.dtype([('objfield', h5py.special_dtype(ref=h5py.Reference))]))

And the last line here is the workaround which I created. 这里的最后一行是我创建的解决方法。

When the error is thrown, it's thrown at this piece of code: 抛出错误时,会抛出这段代码:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-63893646daac> in <module>()
      4 
      5     # All these lines raise ValueError:
----> 6     ds1[0, 'objfield'] = ds2.ref
      7 #     ds1[0, 'objfield'] = (ds2.ref,)
      8 #     ds1[0, 'objfield'] = np.array(ds2.ref)

/usr/local/lib/python2.7/dist-packages/h5py/_hl/dataset.pyc in __setitem__(self, args, val)
    506             val = numpy.asarray(val, dtype=dtype, order='C')
    507             if cast_compound:
--> 508                 val = val.astype(numpy.dtype([(names[0], dtype)]))
    509         else:
    510             val = numpy.asarray(val, order='C')

ValueError: Setting void-array with object members using buffer.

After looking at this code in dataset.py, such error can easily be reproduced with plain numpy, without HDF at all: 在dataset.py中查看此代码之后,可以使用简单的numpy轻松地重现这样的错误,而根本没有HDF:

objarr = np.array(123, dtype=np.dtype('O'))
objarr.astype(np.dtype([('field', np.dtype('O'))]))

The second line here raises exactly the same error. 这里的第二行引发了完全相同的错误。 On the other hand, similarly to the HDF example, this code works: 另一方面,与HDF示例类似,此代码有效:

objarr = np.array((123, ), dtype=np.dtype([('field', np.dtype('O'))]))
objarr = np.asarray(objarr, dtype=np.dtype('O'))
objarr = objarr.astype(np.dtype([('field', np.dtype('O'))]))

So, now you at least have the idea of why this happens, and how to workaround it :) If you are interested in more than this, follow the mentioned issue for the developers' answers (now there are none of them). 所以,现在你至少知道为什么会发生这种情况,以及如何解决它:)如果你对此不感兴趣,请按照上面提到的问题开发人员的答案(现在没有它们)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM