从 Python 中的 HDF5 文件中删除子组

Question

I am trying to delete a subgroup that I've wrote in a HDF5 file using h5py in Python.我正在尝试使用 Python 中的 h5py 删除我在 HDF5 文件中编写的子组。 For example, according to the documentation, the subgroup called "MyDataset" can be deleted with:例如，根据文档，可以使用以下命令删除名为“MyDataset”的子组：

del subgroup["MyDataset"]

I did it and effectively the subgroup is not longer accessible.我做到了，并且实际上无法再访问该子组。 However, the files does not reduce its size.但是，文件不会减小其大小。 My question, is it possible to recover the space from deleted subgroups using h5py without having to rewrite the remaining subgroups into a completely new file?我的问题是，是否可以使用 h5py 从已删除的子组中恢复空间，而不必将剩余的子组重写为一个全新的文件？ Below I provide a small example that illustrate what I am saying:下面我提供一个小例子来说明我在说什么：

import numpy as np
import h5py

myfile = h5py.File('file1.hdf5')
data = np.random.rand(int(1e6))
myfile.create_dataset("MyDataSet", data=data)
myfile.close()

Then I open the file and remove the previous entry:然后我打开文件并删除上一个条目：

myfile = h5py.File('file1.hdf5')
del myfile["MyDataSet"]

and if you try to get the data using:如果您尝试使用以下方法获取数据：

myfile["MyDataSet"].value

you will realize that the data is not longer accessible.您将意识到数据不再可访问。 However, if you check the size of the file it remains constant before and after calling to del.但是，如果您检查文件的大小，它在调用 del 之前和之后都保持不变。

Answer 1

del myfile["MyDataSet"] modifies the File object, but does not modify the underlying file1.hdf5 file. del myfile["MyDataSet"]修改File对象，但不修改底层的file1.hdf5文件。 The file1.hdf5 file not modified until myfile.close() is called.在file1.hdf5 myfile.close()之前不会修改file1.hdf5文件。

If you use a with-statement , myfile.close() will be called automatically for you when Python leaves the with-statement :如果您使用with-statement ，当 Python 离开with-statement时，将自动为您调用myfile.close() ：

import numpy as np
import h5py
import os

path = 'file1.hdf5'
with h5py.File(path, "w") as myfile:
    data = np.random.rand(int(1e6))
    myfile.create_dataset("MyDataSet", data=data)
    print(os.path.getsize(path))

with h5py.File(path, "a") as myfile:
    del myfile["MyDataSet"]
    try:
        myfile["MyDataSet"].value
    except KeyError as err:
        # print(err)
        pass

print(os.path.getsize(path))

prints印刷

8002144         <-- original file size
2144            <-- new file size

Notice that the first time, opening the File in write mode ( "w" ) creates a new file, the second time, opening the File in append mode ( "a" , the default) allows reading the existant file and modifying it.请注意，第一次以写入模式（ "w" ）打开File会创建一个新文件，第二次以附加模式（ "a" ，默认值）打开File允许读取现有文件并对其进行修改。

从 Python 中的 HDF5 文件中删除子组

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-03-31 09:43:40

从 Python 中的 HDF5 文件中删除子组

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-03-31 09:43:40

解决方案1
4 已采纳 2016-03-31 09:43:40