简体   繁体   中英

Converting dates from HDF5 dataset to numpy array

I have a HDF5 dataset having dates matrix which I'm loading in my Python script and want to use it as numpy array -

>>> mat = h5py.File('xyz.mat')
>>> dates = mat['dates']
>>> dates
<HDF5 dataset "dates": shape (11, 285), type "<u2">

If I try to convert it to numpy array, I get the following error -

>>> dates = np.array(dates, dtype='datetime64')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/soft/python-epd/canopy-1.1.0-standalone/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 490, in __array__
    self.read_direct(arr)
  File "/soft/python-epd/canopy-1.1.0-standalone/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 460, in read_direct
    self.id.read(mspace, fspace, dest)
  File "h5d.pyx", line 173, in h5py.h5d.DatasetID.read (h5py/h5d.c:2523)
  File "h5t.pyx", line 1439, in h5py.h5t.py_create (h5py/h5t.c:11361)
TypeError: No conversion path for dtype: dtype('<M8')

Each date in the dataset is of the form "05-Mar-2012".

It seems that your dates are stored… strangely . Your dataset is a 11 x 285 matrix of 16 bit unsigned ints. (It smells like it was exported from Matlab).

Basically the problem is that Numpy tries (and fails) to convert each element of the matrix (aka each individual character of the dates) to a date.

From HDF5's point of view, it would make much more sense to store them as a 285 element array of 11 characters long strings. Then the conversion in Numpy would succeed.

If you cannot change how the file was generated, you can reconstruct the strings in Python by concatenating the 11 characters of each of the 285 column of the matrix. But that would be dirty, you'd better fix how the file is generated ;)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM