简体   繁体   中英

How to access random indices from h5 data set?

I have some h5 data that I want to sample from by using some randomly generated indices. However, if the indices are out of increasing order, then the effort fails. Is it possible to select indices, that have been generated randomly, from h5 data sets?

Here is a MWE citing the error:

import h5py
import numpy as np
arr = np.random.random(50).reshape(10,5)
with h5py.File('example1.h5', 'w') as h5fw:
    h5fw.create_dataset('data', data=arr)

random_subset = h5py.File('example1.h5', 'r')['data'][[3, 1]]

# TypeError: Indexing elements must be in increasing order

I could sort the indices, but then we lose the randomness component.

As hpaulj mentioned, random indices aren't a problem for numpy arrays in memory. So, yes it's possible to select data with randomly generated indices from h5 data sets read to numpy arrays . The key is having sufficient memory to hold the dataset in memory. The code below shows how to do this:

#random_subset = h5py.File('example1.h5', 'r')['data'][[3, 1]]
arr = h5py.File('example1.h5', 'r')['data'][:]
random_subset = arr[[3,1]]

A potential solution is to pre-sort the desired indices as follow:

idx = np.sort([3,1])
random_subset = h5py.File('example1.h5', 'r')['data'][idx]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM