简体   繁体   English

用py3中的h5py打开后,用h5py py2制成的hdf5损坏

[英]hdf5 made with h5py py2 corrupted after opening with h5py in py3

Problem 问题

I have a file created with h5py in python 2.7. 我在python 2.7中使用h5py创建了一个文件。

These steps lead to a corruption: 这些步骤导致损坏:

  1. I download a fresh copy of it from a collaborator using scp . 我使用scp从协作者那里下载了它的新副本。 It is whole and 286MB. 它是完整的286MB。

  2. I check that it is readable by opening it with hdfview . 我通过使用hdfview打开它来检查它是否可读。 This shows all the datasets and groups properly. 这样可以正确显示所有数据集和组。

  3. I exit hdfview. 我退出hdfview。

  4. Repeat steps 2 and 3 to ensure hdfview is not corrupting the file. 重复步骤2和3,以确保hdfview不会损坏文件。

  5. I open ipython 3.6 and, 我打开ipython 3.6,

    import h5py f = h5py.File(filename,'r') g = f['/sol000']#one group that should be there

I get KeyError: "Unable to open object (Object 'sol000' doesn't exist)" 我收到KeyError: "Unable to open object (Object 'sol000' doesn't exist)"

  1. I f.close() and exit ipython. f.close()并退出ipython。 I again open it with hdfview and the entire structure is gone. 我再次使用hdfview打开它,整个结构消失了。 The file is now 4KB. 该文件现在为4KB。

I am able to open the file in python 2 hdf5 and access all the datasets, but must use python 3 for my code. 我能够在python 2 hdf5中打开文件并访问所有数据集,但是我的代码必须使用python 3。

Systems 系统篇

File created on Fedora 24 64-bit, python 2.7, hdf5 2.7.0 在Fedora 24 64位,python 2.7,hdf5 2.7.0上创建的文件

System trying to read it on Fedora 25 64-bit python 3.6, h5py 2.7.0 系统试图在Fedora 25 64位python 3.6,h5py 2.7.0上读取它

Minimal code showing should work 最少的代码显示应该起作用

On system 1: 在系统1上:

import h5py
import numpy as np
f = h5py.File("file.hdf5","w")
f.create_dataset("/sol000/data",(100,100),dtype=float)
f["/sol000/data"] = np.zeros([100,100],dtype=float)
f.close()

On system 2: Do steps 1-4. 在系统2上:执行步骤1-4。

import h5py
f = h5py.File("file.hdf5","r")
f.visit(lambda *x:print(x))
#(sol000/data,)
f.close()

The solution was to enforce libver=earliest . 解决方案是强制执行libver=earliest Ie the following code worked to open the file: 即以下代码可用来打开文件:

import h5py
f = f.File("file.hdf5","r",libver="earliest")

I've discovered a possible inconsistency in h5py documentation. 我发现h5py文档中可能存在不一致的地方。 It claims that 它声称

The “earliest” option means that HDF5 will make a best effort to be backwards compatible. “最早”选项意味着HDF5将尽力向后兼容。

The default is “earliest”. 默认值为“最早”。

This can't be true if it only works when I explicitly set it. 如果仅当我明确设置它时才起作用,这不是真的。 My collaborator, it turns out, created the corruptable file with an older version of hdf5 C library. 事实证明,我的合作者使用旧版本的hdf5 C库创建了可损坏的文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM