[英]What is an efficient implementation of a custom map-style dataset for a hdf5 file with irregular structure?
I have a hdf5 file that contains picture of a certain number of people, from a certain number of source cameras, for many seconds.我有一个 hdf5 文件,其中包含一定数量的人的图片,来自一定数量的源相机,持续了很多秒。 So it is like this:
所以它是这样的:
file[seconds][person][camera].
But this is quite irregular, such that for a given second, there may be different number of persons, and for a given second and person there may be picture from different cameras.但这是非常不规则的,因此对于给定的一秒,可能有不同数量的人,对于给定的一秒和一个人,可能有来自不同相机的图片。 I want to create a map-style pytorch.dataset, so I need to implement get_item(idx) that will return a unique second, person and camera for that idx.
我想创建一个地图样式的 pytorch.dataset,所以我需要实现 get_item(idx),它将为该 idx 返回一个唯一的秒、人和相机。
My first idea is to iterate through the whole dataset and create dictionaries that can be accessed with idx, that is, second[idx] = this_second, person[idx] = this_person, camera[idx] = this_camera.我的第一个想法是遍历整个数据集,创建可以用idx访问的字典,即second[idx] = this_second, person[idx] = this_person, camera[idx] = this_camera。 So I can use all of that to get a unique data from the dataset with:
因此,我可以使用所有这些从数据集中获取唯一数据:
file[this_second][this_person][this_camera].
However this solution seems too complicated for me.然而,这个解决方案对我来说似乎太复杂了。 I wonder if there is a better way to solve that, since this is probably a common problem.
我想知道是否有更好的方法来解决这个问题,因为这可能是一个常见问题。
I agree, a dictionary is too complicated.我同意,字典太复杂了。 Instead, create an array where to first index is the item index, and the second axis has 3 values for associated second, person, camera indices.
相反,创建一个数组,其中第一个索引是项目索引,第二个轴具有关联的第二个、人、相机索引的 3 个值。 If you plan to do this frequently, you can create a dataset from the array, then use the dataset.
如果您打算经常这样做,您可以从数组中创建一个数据集,然后使用该数据集。
Psuedo-code provided below:下面提供的伪代码:
#create array for index values
idx_arr = np.zeros((no_idxs,3),dtype=int)
i_cnt = 0
#Loop on data:
for...
# get this second, person, camera data
# then add to index array
idx_arr[i_cnt] = [ this_second, this_person, this_camera ]
i_cnt += 1
with h5py.File(your_hdf5_file,'a') as h5f:
create_dataset('indices',data=idx_array)
with h5py.File(your_hdf5_file,'r') as h5f:
idx_ds = h5f['indices']
img_ds = h5f['your_image_dataset_name']
for row_arr in idx_ds:
# use row_arr values to get next second/person/camera image
img = img_ds[row_arr[0],row_arr[1],row_arr[2]]and store as a data set
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.