[英]Is there a way of removing rows from a HDF5 dataset?
I have created a H5PY dataset, with around 2.1 million instances.我创建了一个 H5PY 数据集,其中包含大约 210 万个实例。 The issue is I have filled all the rows apart from the last one.
问题是我已经填写了除最后一行之外的所有行。 I want to remove the last row but unsure if it is feasible or safe to do.
我想删除最后一行,但不确定这样做是否可行或安全。
This is a snippet of how the dataset is created:这是如何创建数据集的片段:
shape = (dataset_length, args.batch_size, 2048, 1, 1)
with h5py.File(path, mode='a') as hdf5_file:
array_40 = hdf5_file.create_dataset(
f'{phase}_40x_arrays', shape, maxshape=(None, args.batch_size, 2048, 1, 1)
# either new or checkpointed file exists
# load file and create references to exisitng h5 datasets
with h5py.File(path, mode='r+') as hdf5_file:
array_40 = hdf5_file[f'{phase}_40x_arrays']
for i, (inputs40x, labels) in enumerate(dataloaders_dict):
inputs40x = inputs40x.to(device)
x40 = resnet(inputs40x)
array_40[batch_idx, ...] = x40.cpu()
hdf5_file.flush()
I'm not really sure if I need to copy all instances to a new dataset.我不确定是否需要将所有实例复制到新数据集。 I tried resizing, but that didn't work...
我尝试调整大小,但没有奏效...
Cheers,干杯,
Here is a very simple example to demonstrate dataset.resize()
for one dataset.这是一个非常简单的示例,用于演示一个数据集的
dataset.resize()
。
import numpy as np
import h5py
arr = np.random.rand(100).reshape(20,5)
with h5py.File('SO_61487687.h5', mode='a') as h5f:
h5f.create_dataset('array1', data=arr, maxshape=(None, 5) )
with h5py.File('SO_61487687.h5', mode='r+') as h5f:
print ('Before:', h5f['array1'].shape)
h5f['array1'].resize(10,axis=0)
print ('After:', h5f['array1'].shape)
h5f.flush()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.