[英]how to create hdf5 file from numpy dataset files
I have 1970 .npy
files as features for MSVD dataset.我有 1970
.npy
文件作为 MSVD 数据集的特征。 I want to create one .hdf5
file from these numpy files.我想从这些 numpy 文件创建一个
.hdf5
文件。
import os
import numpy as np
import hdf5
TRAIN_FEATURE_DIR = "MSVD"
for filename in os.listdir(TRAIN_FEATURE_DIR):
f = np.load(os.path.join(TRAIN_FEATURE_DIR, filename))
...
Creating a dataset from an array is easy.从数组创建数据集很容易。 Example below loops over all
.npy
files in a folder and creates 1 dataset for each array.下面的示例遍历文件夹中的所有
.npy
文件并为每个数组创建 1 个数据集。 (FYI, I prefer glob.iglob()
to get the filenames using a wildcard.) Dataset name is the same as the filename. (仅供参考,我更喜欢
glob.iglob()
使用通配符获取文件名。)数据集名称与文件名相同。
import glob
import numpy as np
import h5py
with h5py.File('SO_74788877.h5','w') as h5f:
for filename in glob.iglob('*.npy'):
arr = np.load(filename)
h5f.create_dataset(filename,data=arr)
This code shows how to access the dataset names and values from the H5 file created above.此代码显示如何从上面创建的 H5 文件访问数据集名称和值。 (
dataset
is a dataset object which behaves like a numpy array in many instances): (
dataset
是一个数据集对象,在许多情况下表现得像一个 numpy 数组):
with h5py.File('SO_74788877.h5','r') as h5f:
for name, dataset in h5f.items():
print(name, dataset.shape, dataset.dtype)
The following code solved my problem:以下代码解决了我的问题:
import os
import numpy as np
import h5py
TRAIN_FEATURE_DIR = "MSVD" # MSVD ==> numpy folder path
h5 = h5py.File("out.hdf5", 'w') # out ==> output hdf5 file name
for filename in os.listdir(TRAIN_FEATURE_DIR):
video_id = os.path.splitext(filename)[0] # optional, to remove '.npy'
video_id = video_id.split('.')[0] # optional, to remove '.avi' from video_id
f = np.load(os.path.join(TRAIN_FEATURE_DIR, filename))
h5[video_id] = f
h5.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.