简体   繁体   English

如何从 numpy 数据集文件创建 hdf5 文件

[英]how to create hdf5 file from numpy dataset files

I have 1970 .npy files as features for MSVD dataset.我有 1970 .npy文件作为 MSVD 数据集的特征。 I want to create one .hdf5 file from these numpy files.我想从这些 numpy 文件创建一个.hdf5文件。

import os 
import numpy as np
import hdf5


TRAIN_FEATURE_DIR = "MSVD"   

for filename in os.listdir(TRAIN_FEATURE_DIR):
    f = np.load(os.path.join(TRAIN_FEATURE_DIR, filename))
...

Creating a dataset from an array is easy.从数组创建数据集很容易。 Example below loops over all .npy files in a folder and creates 1 dataset for each array.下面的示例遍历文件夹中的所有.npy文件并为每个数组创建 1 个数据集。 (FYI, I prefer glob.iglob() to get the filenames using a wildcard.) Dataset name is the same as the filename. (仅供参考,我更喜欢glob.iglob()使用通配符获取文件名。)数据集名称与文件名相同。

import glob 
import numpy as np
import h5py

with h5py.File('SO_74788877.h5','w') as h5f:
    for filename in glob.iglob('*.npy'):
        arr = np.load(filename)
        h5f.create_dataset(filename,data=arr)

This code shows how to access the dataset names and values from the H5 file created above.此代码显示如何从上面创建的 H5 文件访问数据集名称和值。 ( dataset is a dataset object which behaves like a numpy array in many instances): dataset是一个数据集对象,在许多情况下表现得像一个 numpy 数组):

with h5py.File('SO_74788877.h5','r') as h5f:
    for name, dataset in h5f.items():
        print(name, dataset.shape, dataset.dtype)

The following code solved my problem:以下代码解决了我的问题:

import os 
import numpy as np
import h5py


TRAIN_FEATURE_DIR = "MSVD"                    # MSVD ==> numpy folder path 

h5 = h5py.File("out.hdf5", 'w')               # out ==> output hdf5 file name

for filename in os.listdir(TRAIN_FEATURE_DIR):
    
    video_id = os.path.splitext(filename)[0]  # optional, to remove '.npy'   
    video_id = video_id.split('.')[0]         # optional, to remove '.avi' from video_id
    
    f = np.load(os.path.join(TRAIN_FEATURE_DIR, filename))
    h5[video_id] = f
   
     
h5.close()
    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM