简体   繁体   English

PyTables与Matlab HDF5的读取时间

[英]PyTables vs Matlab HDF5 read times

I have an HDF5 output file from NASTRAN that contains mode shape data. 我有一个来自NASTRAN的HDF5输出文件,其中包含模式形状数据。 I am trying to read them into Matlab and Python to check various post-processing techniques. 我试图将它们读入Matlab和Python,以检查各种后处理技术。 The file in question is in the local directory for both of these tests. 这两个测试的文件都在本地目录中。 The file is semi-large at 1.2 GB but certainly not that large in terms of HDF5 files I have read previously. 该文件大小为1.2 GB,为半大文件,但就我之前阅读的HDF5文件而言,肯定不是那么大。 There are 17567342 rows and 8 columns in the table I want to access. 我要访问的表中有17567342行和8列。 The first and last columns are integers the middle 6 are floating point numbers. 第一列和最后一列是整数,中间6个是浮点数。

Matlab: Matlab的:

file = 'HDF5.h5';
hinfo = hdf5info(file);
% ... Find the dataset I want to extract
t = hdf5read(file, '/NASTRAN/RESULT/NODAL/EIGENVECTOR');

This last operation is extremely slow (can be measured in hours). 最后的操作非常慢(可以以小时为单位)。

Python: 蟒蛇:

import tables
hfile = tables.open_file("HDF5.h5")
modetable = hfile.root.NASTRAN.RESULT.NODAL.EIGENVECTOR
data = modetable.read()

This last operation is basically instant. 最后的操作基本上是即时的。 I can then access data as if it were a numpy array. 然后,我可以访问data ,就好像它是一个numpy数组一样。 I am clearly missing something very basic about what these commands are doing. 我显然缺少有关这些命令正在执行的工作的一些基本信息。 I'm thinking it might have something to do with data conversion but I'm not sure. 我认为这可能与数据转换有关,但不确定。 If I do type(data) I get back numpy.ndarray and type(data[0]) returns numpy.void . 如果我type(data)我将返回numpy.ndarraytype(data[0])返回numpy.void

What is the correct (ie speedy) way to read the dataset I want into Matlab? 将所需的数据集读入Matlab的正确(即快速)方法是什么?

Matt, Are you still working on this problem? Matt,您还在解决这个问题吗? I am not a matlab guy, but I am familiar with Nastran HDF5 file. 我不是Matlab专家,但是我对Nastran HDF5文件很熟悉。 You are right; 你是对的; 1.2 GB is big, but not that big by today's standards. 1.2 GB很大,但按今天的标准来看还不算大。
You might be able to diagnose the matlab performance bottle neck by running tests with different numbers of rows in your EIGENVECTOR dataset. 您可以通过在EIGENVECTOR数据集中使用不同行数运行测试来诊断matlab性能瓶颈。 To do that (without running a lot of Nastran jobs), I created some simple code to create a HDF5 file with a user defined # of rows. 为此(无需运行许多Nastran作业),我创建了一些简单的代码来创建具有用户定义的行数的HDF5文件。 It mimics the structure of the Nastran Eigenvector Result dataset. 它模仿了Nastran特征向量结果数据集的结构。 See below: 见下文:

import tables as tb
import numpy as np
hfile = tb.open_file('SO_54300107.h5','w')

eigen_dtype = np.dtype([('ID',int), ('X',float),('Y',float),('Z',float),
                        ('RX',float),('RY',float),('RZ',float), ('DOMAIN_ID',int)])

fsize = 1000.0
isize = int(fsize)
recarr = np.recarray((isize,),dtype=eigen_dtype)

id_arr = np.arange(1,isize+1)
dom_arr = np.ones((isize,), dtype=int)
arr = np.array(np.arange(fsize))/fsize

recarr['ID'] = id_arr
recarr['X'] = arr
recarr['Y'] = arr
recarr['Z'] = arr
recarr['RX'] = arr
recarr['RY'] = arr
recarr['RZ'] = arr
recarr['DOMAIN_ID'] = dom_arr

modetable = hfile.create_table('/NASTRAN/RESULT/NODAL', 'EIGENVECTOR',
                                 createparents=True, obj=recarr )

hfile.close()

Try running this with different values for fsize (# of rows), then attach the HDF5 file it creates to matlab. 尝试使用不同的fsize值(行数)运行此文件,然后将其创建的HDF5文件附加到matlab。 Maybe you can find the point where performance noticeably degrades. 也许您可以找到性能明显下降的地方。

Matlab provided another HDF5 reader called h5read . Matlab提供了另一个名为h5read HDF5阅读器。 Using the same basic approach the amount of time taken to read the data was drastically reduced. 使用相同的基本方法,可以大大减少读取数据所需的时间。 In fact hdf5read is listed for removal in a future version. 实际上, hdf5read已列出要在将来的版本中删除。 Here is same basic code with the perfered functions. 这是具有相同功能的基本代码。

file = 'HDF5.h5';
hinfo = h5info(file);
% ... Find the dataset I want to extract
t = h5read(file, '/NASTRAN/RESULT/NODAL/EIGENVECTOR');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM