简体   繁体   English

Append HDF5 文件中的数据

[英]Append Data in HDF5 file

I want to append new date to my already created HDF5 file but I don't how to append more data to it, I don't know the actual syntax for appending我想 append 新日期到我已经创建的 HDF5 文件,但我不知道如何 append 更多数据,我不知道附加的实际语法

I have created an HDF5 file to save my data in HDF format as我已经创建了一个 HDF5 文件来将我的数据以 HDF 格式保存为

with h5py.File(save_path+'PIC200829_256x256x256x3_fast_sj1.hdf5', 'w') as db:
       db.create_dataset('Predframes', data=trainX)
       db.create_dataset('GdSignal', data=trainY)
# this can create an hdf5 file with given name
# and save the data in the given format 

what I want is that I want to append more data (same data) to it to, in next iteration, instead of overwriting and creating new HDF file, one thing I know that I will change "w" to "a" but I don't know what I need to write for append instead of create我想要的是我想要 append 更多数据(相同数据),在下一次迭代中,而不是覆盖和创建新的 HDF 文件,我知道我会将“w”更改为“a”但我不知道'知道我需要为 append 写什么而不是创建

Instead of db.create_dataset('Predframes', data=trainX) as db.append('Predframes', data=trainX) is not the right format/syntax?而不是db.create_dataset('Predframes', data=trainX) as db.append('Predframes', data=trainX)格式/语法不正确? What should I write to append instead of create?我应该写什么到 append 而不是创建?

The shape of the trainX is (2500, 100, 100, 40) so when the next trainX with same shape (2500, 100, 100, 40) is appended with the first one, its size should be (5000, 100, 100, 40) while the size of trainY is (2500,80). trainX 的形状是 (2500, 100, 100, 40) 所以当下一个具有相同形状 (2500, 100, 100, 40) 的 trainX 附加到第一个时,它的大小应该是 (5000, 100, 100, 40) 而 trainY 的大小是 (2500,80)。 After appending it should be (5000, 80)添加后应该是 (5000, 80)

Here is the required code.这是所需的代码。 The initial creation of the dataset has to specify that the outermost dimension should be able to be resized.数据集的初始创建必须指定最外层维度应该能够调整大小。

from os import path

def create_for_append(h5file, name, data):
    data = np.asanyarray(data)
    return h5file.create_dataset(
          name, data=data, maxshape=(None,) + data.shape[1:])


filepath = path.join(save_path, 'PIC200829_256x256x256x3_fast_sj1.hdf5')
with h5py.File(filepath, 'w') as db:
    create_for_append(db,'Predframes', trainX)
    create_for_append(db,'GdSignal', trainY)

Then we can append the new data by resizing the dataset and putting the new data in the newly allocated range.然后我们可以通过调整数据集的大小并将新数据放在新分配的范围内来 append 新数据。

def append_to_dataset(dataset, data):
    data = np.asanyarray(data)
    dataset.resize(len(dataset) + len(data), axis=0)
    dataset[-len(data):] = data


with h5py.File(filepath, 'a') as db:
    append_to_dataset(db['Predframes'], trainX)
    append_to_dataset(db['GdSignal'], trainY)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM