Quickly read HDF 5 file in python?

Question

I have an instrument that saves data (many traces from an analog-to-digital converter) as an HDF 5 file. How can I efficiently open this file in python? I have tried the following code, but it seems to take a very long time to extract the data.

Also, it reads the data in the wrong order: instead of reading 1,2,3, it reads 1,10,100,1000.

Any ideas?

Here is a link to the sample data file: https://drive.google.com/file/d/0B4bj1tX3AZxYVGJpZnk2cDNhMzg/edit?usp=sharing

And here is my super-slow code:

import h5py
import matplotlib.pyplot as plt
import numpy as np


f = h5py.File('sample.h5','r')

ks = f.keys()

for index,key in enumerate(ks[:10]):
    print index, key
    data = np.array(f[key].values())
    plt.plot(data.ravel())

plt.show()

Answer 1

As far as the order of your data:

In [10]: f.keys()[:10]
Out[10]:
[u'Acquisition.1',
 u'Acquisition.10',
 u'Acquisition.100',
 u'Acquisition.1000',
 u'Acquisition.1001',
 u'Acquisition.1002',
 u'Acquisition.1003',
 u'Acquisition.1004',
 u'Acquisition.1005',
 u'Acquisition.1006']

This is the correct order for numbers that isn't left padded with zeros. It's doing its sort lexicographically, not numerically. See Python: list.sort() doesn't seem to work for a possible solution.

Second, you're killing your performance by rebuilding the array within the loop:

In [20]: d1 = f[u'Acquisition.990'].values()[0][:]

In [21]: d2 = np.array(f[u'Acquisition.990'].values())

In [22]: np.allclose(d1,d2)
Out[22]: True

In [23]: %timeit d1 = f[u'Acquisition.990'].values()[0][:]
1000 loops, best of 3: 401 µs per loop

In [24]: %timeit d2 = np.array(f[u'Acquisition.990'].values())
1 loops, best of 3: 1.77 s per loop

Quickly read HDF 5 file in python?

Question

1 answers

solution1
3 ACCPTED 2014-04-19 03:25:46

Quickly read HDF 5 file in python?

Question

1 answers

solution1 3 ACCPTED 2014-04-19 03:25:46

solution1
3 ACCPTED 2014-04-19 03:25:46