简体   繁体   English

在 Python 中读取 HDF 文件的属性

[英]Read the properties of HDF file in Python

I have a problem reading hdf file in pandas.我在 Pandas 中读取 hdf 文件时遇到问题。 As of now, I don't know the keys of the file.到目前为止,我不知道文件的密钥。

How do I read the file [data.hdf] in such a case?在这种情况下如何读取文件 [data.hdf]? And, my file is .hdf not .h5 , Does it make a difference it terms data fetching?而且,我的文件是 .hdf 而不是 .h5 ,它对数据获取有什么影响吗?

I see that you need a 'group identifier in the store'我看到您需要“商店中的组标识符”

pandas.io.pytables.read_hdf(path_or_buf, key, **kwargs)

I was able to get the metadata from pytables我能够从 pytables 中获取元数据

File(filename=data.hdf, title='', mode='a', root_uep='/', filters=Filters(complevel=0, shuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) ''
/UID (EArray(317,)) ''
  atom := StringAtom(itemsize=36, shape=(), dflt='')
  maindim := 0
  flavor := 'numpy'
  byteorder := 'irrelevant'
  chunkshape := (100,)
/X Y (EArray(8319, 2, 317)) ''
  atom := Float32Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (1000, 2, 100)

How do I make it readable via pandas?如何通过 Pandas 使其可读?

First (.hdf or .h5) doesn't make any difference.首先(.hdf 或 .h5)没有任何区别。 Second, I'm not sure about the pandas, but I read the HDF5 key like:其次,我不确定大熊猫,但我读了 HDF5 密钥,如:

import h5py
h5f = h5py.File("test.h5", "r")
h5f.keys()

or或者

h5f.values()

Docs are here .文档在这里 However you will jot be able to directly read the format you show with pandas.但是,您将能够直接读取使用 Pandas 显示的格式。 You need to use PyTables to read it in. pandas can read in PyTables Table format directly even without the meta data that pandas uses.需要使用 PyTables 来读入。pandas 可以直接读取 PyTables Table 格式,即使没有 pandas 使用的元数据。

pyhdf will be alternative option for hdf file in python pyhdf将是 python 中 hdf 文件的替代选项

you can read and see keys from:您可以从以下位置读取和查看密钥:

import pyhdf
hdf = pyhdf.SD.SD('file.hdf')
hdf.datasets()

I hope it will help you!我希望它会帮助你! gud luck祝你好运

You can use this simple function to see the variable names of any the HDF file (only works for the variables in the scientific mode)您可以使用这个简单的函数查看任何 HDF 文件的变量名称(仅适用于科学模式下的变量)

from pyhdf.SD  import *

def HDFvars(File):
    """
    Extract variable names for an hdf file
    """
    # hdfFile = SD.SD(File, mode=1)
    hdfFile = SD(File, mode=1)
    dsets = hdfFile.datasets()
    k = []
    for key in dsets.keys():
        k.append(key)
    k.sort()
    hdfFile.end() # close the file
    return k

If the variables aren't in the scientific mode, you can try whit pyhdf.V using the following program that shows the contents of the vgroups contained inside any HDF file.如果变量不在科学模式下,您可以使用以下程序尝试 whit pyhdf.V ,该程序显示包含在任何 HDF 文件中的 vgroups 的内容。

from pyhdf.HDF import *
from pyhdf.V   import *
from pyhdf.VS  import *
from pyhdf.SD  import *

def describevg(refnum):
    # Describe the vgroup with the given refnum.
    # Open vgroup in read mode.
    vg = v.attach(refnum)
    print "----------------"
    print "name:", vg._name, "class:",vg._class, "tag,ref:",
    print vg._tag, vg._refnum

    # Show the number of members of each main object type.
    print "members: ", vg._nmembers,
    print "datasets:", vg.nrefs(HC.DFTAG_NDG),
    print "vdatas:  ", vg.nrefs(HC.DFTAG_VH),
    print "vgroups: ", vg.nrefs(HC.DFTAG_VG)

    # Read the contents of the vgroup.
    members = vg.tagrefs()

    # Display info about each member.
    index = -1
    for tag, ref in members:
        index += 1
        print "member index", index
        # Vdata tag
        if tag == HC.DFTAG_VH:
            vd = vs.attach(ref)
            nrecs, intmode, fields, size, name = vd.inquire()
            print "  vdata:",name, "tag,ref:",tag, ref
            print "    fields:",fields
            print "    nrecs:",nrecs
            vd.detach()

        # SDS tag
        elif tag == HC.DFTAG_NDG:
            sds = sd.select(sd.reftoindex(ref))
            name, rank, dims, type, nattrs = sds.info()
            print "  dataset:",name, "tag,ref:", tag, ref
            print "    dims:",dims
            print "    type:",type
            sds.endaccess()

        # VS tag
        elif tag == HC.DFTAG_VG:
            vg0 = v.attach(ref)
            print "  vgroup:", vg0._name, "tag,ref:", tag, ref
            vg0.detach()

        # Unhandled tag
        else:
            print "unhandled tag,ref",tag,ref

    # Close vgroup
    vg.detach()

# Open HDF file in readonly mode.
filename = 'yourfile.hdf'
hdf = HDF(filename)

# Initialize the SD, V and VS interfaces on the file.
sd = SD(filename)
vs = hdf.vstart()
v  = hdf.vgstart()

# Scan all vgroups in the file.
ref = -1
while 1:
    try:
        ref = v.getid(ref)
        print ref
    except HDF4Error,msg:    # no more vgroup
        break
    describevg(ref)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM