简体   繁体   中英

hdf5 Matrix Reading with python

I have a huge sequence (1000000) of small matrices (32x32) stored in a hdf5 file, each one with a label. Each of this matrices represent a sensor data for a specific time.

I want to obtain the evolution for each pixel in for a small time slice, different for each x,y position in the matrix.

This is taking more time than I expect.

  def getPixelSlice (self,xpixel,ypixel,initphoto,endphoto):

       #obtain h5 keys inside time range between initphoto and endphoto
       valid=np.where(np.logical_and(self.photoList>=initphoto,self.photoList<endphoto)) 

       #look at pixel data in valid frames
       evolution = []

       #for each valid frame, obtain the data, and append the target pixel to the list.
       for frame in valid[0]:
           data = self.h5f[str(self.photoList[frame])]          
           evolution.append(data[ypixel][xpixel])

       return evolution,valid

So, there is a problem here that took me a while to sort out for a similar application. Due to the physical limitations of hard drives, the data are stored in such a way that with a three dimensional array it will always be easier to read in one orientation than another. It all depends on what order you stored the data in.

How you handle this problem depends on your application. My specific application can be characterized as "write few, read many". In this case, it makes the most sense to store the data in the order that I expect to read it. To do this, I use PyTables and specify a "chunkshape" that is the same as one of my timeseries. So, in your case it would be (1,1,1000000). I'm not sure if that size is too large or not, though, so you may need to break it down a bit farther, say (1,1,10000) or something like that.

For more info see PyTables Optimization Tips.

For applications where you intend to read in a specific orientation many times, it is crucial that you choose an appropriate chuck shape for your HDF5 arrays.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM