简体   繁体   English

HDF5中的XML文件,h5py

[英]An XML file inside HDF5, h5py

I am using h5py to save data (float numbers), in groups. 我正在使用h5py来保存数据(浮点数)。 In addition to the data itself, I need to include an additional file (an .xml file, containing necessary information) within the hdf5. 除了数据本身,我还需要在hdf5中包含一个附加文件(包含必要信息的.xml文件)。 How do i do this? 我该怎么做呢? Is my approach wrong? 我的方法有误吗?

f = h5py.File('filename.h5')
f.create_dataset('/data/1',numpy_array_1)
f.create_dataset('/data/2',numpy_array_2)
.
.

my h5 tree should look thus: 我的h5树应该是这样的:

/ 
/data
/data/1 (numpy_array_1)
/data/2 (numpy_array_2)
.
.
/morphology.xml (?)

One option is to add it as a variable-length string dataset. 一种选择是将其添加为可变长度字符串数据集。

http://code.google.com/p/h5py/wiki/HowTo#Variable-length_strings http://code.google.com/p/h5py/wiki/HowTo#Variable-length_strings

Eg: 例如:

import h5py
xmldata = """<xml>
<something>
    <else>Text</else>
</something>
</xml>
"""

# Write the xml file...
f = h5py.File('test.hdf5', 'w')
str_type = h5py.new_vlen(str)
ds = f.create_dataset('something.xml', shape=(1,), dtype=str_type)
ds[:] = xmldata
f.close()

# Read the xml file back...
f = h5py.File('test.hdf5', 'r')
print f['something.xml'][0]

If you just need to attach the XML file to the hdf5 file, you can add it as an attribute to the hdf5 file. 如果只需要将XML文件附加到hdf5文件,则可以将其作为属性添加到hdf5文件中。

xmlfh = open('morphology.xml', 'rb')
h5f.attrs['xml'] = xmlfh.read()

You can access the xml file then like this: 您可以像下面这样访问xml文件:

h5f.attrs['xml']

Notice, also, that you can't store attributes larger than 64K, you may want to compress the file before attaching. 另请注意,您不能存储大于64K的属性,您可能希望在附加之前压缩文件。 You can have a look at compressing libraries in the standard library of Python. 您可以查看Python标准库中的压缩库。

However, this doesn't make the information in the XML file very accessible. 但是,这不会使XML文件中的信息非常容易访问。 If you want to associate the metadata of each dataset to some metadata in the XML file, you could map it as you need using an XML library like lxml . 如果要将每个数据集的元数据与XML文件中的某些元数据相关联,可以使用XML库(如lxml)根据需要进行映射。 You can also add each field of the XML data as a separate attribute so that you can query datasets by XML field, this all depends on what you have in the XML file. 您还可以将XML数据的每个字段添加为单独的属性,以便您可以按XML字段查询数据集,这完全取决于您在XML文件中的内容。 Try to think about how you would like to retrieve the data later. 尝试考虑以后如何检索数据。

You may also want to create groups for each xml file with its datasets and put it all in a single hdf5 file. 您可能还希望为每个xml文件及其数据集创建组,并将其全部放在一个hdf5文件中。 I don't know how large are the files you are managing, YMMV. 我不知道你管理的文件有多大,YMMV。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM