对于具有不规则结构的 hdf5 文件，自定义地图样式数据集的有效实现是什么？

Question

我有一个 hdf5 文件，其中包含一定数量的人的图片，来自一定数量的源相机，持续了很多秒。 所以它是这样的：

file[seconds][person][camera].

但这是非常不规则的，因此对于给定的一秒，可能有不同数量的人，对于给定的一秒和一个人，可能有来自不同相机的图片。 我想创建一个地图样式的 pytorch.dataset，所以我需要实现 get_item(idx)，它将为该 idx 返回一个唯一的秒、人和相机。

我的第一个想法是遍历整个数据集，创建可以用idx访问的字典，即second[idx] = this_second, person[idx] = this_person, camera[idx] = this_camera。 因此，我可以使用所有这些从数据集中获取唯一数据：

 file[this_second][this_person][this_camera].

然而，这个解决方案对我来说似乎太复杂了。 我想知道是否有更好的方法来解决这个问题，因为这可能是一个常见问题。

Answer 1

我同意，字典太复杂了。 相反，创建一个数组，其中第一个索引是项目索引，第二个轴具有关联的第二个、人、相机索引的 3 个值。 如果您打算经常这样做，您可以从数组中创建一个数据集，然后使用该数据集。

下面提供的伪代码：

#create array for index values
idx_arr = np.zeros((no_idxs,3),dtype=int)  
i_cnt = 0
#Loop on data:
for...    
    # get this second, person, camera data
    # then add to index array
    idx_arr[i_cnt] = [ this_second, this_person, this_camera ]
    i_cnt += 1

with h5py.File(your_hdf5_file,'a') as h5f:
    create_dataset('indices',data=idx_array)

with h5py.File(your_hdf5_file,'r') as h5f:
    idx_ds = h5f['indices']
    img_ds = h5f['your_image_dataset_name']
    
    for row_arr in idx_ds:
        # use row_arr values to get next second/person/camera image
        img = img_ds[row_arr[0],row_arr[1],row_arr[2]]and store as a data set

对于具有不规则结构的 hdf5 文件，自定义地图样式数据集的有效实现是什么？

问题描述

1 个解决方案

解决方案1
0 2021-12-23 20:49:06

对于具有不规则结构的 hdf5 文件，自定义地图样式数据集的有效实现是什么？

问题描述

1 个解决方案

解决方案1 0 2021-12-23 20:49:06

解决方案1
0 2021-12-23 20:49:06