简体   繁体   English

读取和操作HDF5文件

[英]Reading and Operating on a HDF5 File

I found a similar question at HDF5 Example code 我在HDF5示例代码中发现了类似的问题
but I'm having trouble viewing the hdf5 dataset contents correctly. 但我无法正确查看hdf5数据集内容。

The dataset I'm looking at contains string headers with strings in the first column and doubles in the others. 我正在查看的数据集在第一列中包含字符串标题,在其他列中包含字符串。

Here's what my code looks like: 这是我的代码:

public static void readh5(string path, string filename)
{
    H5.Open();
    var fileID = H5F.open(path + filename, H5F.OpenMode.ACC_RDONLY);

    var groupID = H5G.open(fileID, "/Example Group/");
    var datasetID = H5D.open(groupID, "Events");
    var dataSpace = H5D.getSpace(datasetID);
    var size = H5S.getSimpleExtentDims(dataSpace);
    var dataType = H5D.getType(datasetID);

    double[,] dataArray = new double[size[0],11];
    var wrapArray = new H5Array<double>(dataArray);
    H5D.read(datasetID, dataType, wrapArray);
    Console.WriteLine(wrapArray);
}

When I debug and look into wrapArray each element is an incredibly large or small doubles from 10^300 to 10^-300 in value and I don't know why. 当我调试并查看wrapArray时,每个元素的值都是非常大或小的双精度值,从10 ^ 300到10 ^ -300,我不知道为什么。 I don't think those are ID numbers of the elements. 我认为这些不是元素的ID号。 I've tried changing the datatype of wrapArray and dataArray to object but that still doesn't give me the exact contents of the dataset. 我试过将wrapArray和dataArray的数据类型更改为object,但这仍然不能为我提供数据集的确切内容。

The output I'm getting for wrapArray looks like: 我为wrapArray获得的输出如下所示:

[0,0] 4.0633928641260729E+87  
[0,1] 9.77854726248995E-320  
[0,2] 1.52021104712121E-312  

etc. 等等

But what I want is: 但是我想要的是:

[0,0] Event1  
[0,1] 2  
[0,2] 56  

etc. 等等

After reading in the dataset I want to loop through the first column to find specific strings, and get the corresponding elements in the other columns. 读完数据集后,我想遍历第一列以查找特定的字符串,并在其他列中获取相应的元素。 But I have to figure out this out. 但是我必须弄清楚这一点。

对我来说,它的工作原理是简单地检查数据集的实际数据类型(使用HDFView),然后使包含该数据类型的数组而不是双精度数组。

John, if a dataset has one column filled with string values and the seccond column with double values, than the dataset is made of "COMPOUND" type. 约翰(John),如果数据集的一列填充字符串值,第二个列填充双精度值,则该数据集为“ COMPOUND”类型。 Things are a little bit complicated then and (from what I know today.. I am a newbie to HDF5) it is not possible to simply load values to 2D array. 那时情况有点复杂,并且(从我今天所知道的。我是HDF5的新手)不可能将值简单地加载到2D数组中。 Instead, you have to: 相反,您必须:

//1) Define byte array in memory. We know that it is one string and two doubles. 
//Check that string in dataset is really 256 chars long.
 int rows = size[0]; //this should be number of rows in dataset. 
 int oneRowDataSize = 256+8+8; //string+double+double 
 byte[] data_to_read = new byte[oneRowDataSize * rows];

// 2) Read data to our byte array
 H5D.read(datasetID, dataType, new H5Array<byte>(data_to_read));

 // 3) Decompose our byte array to rows and individual values
 for (int m = 0; m < rows; m++)
  {

 //4) offset of the row in the byte array
      int pos = m*oneRowDataSize;

 //5) compute individual offsets
      int posString = pos;
      int posDouble1 = pos + 256; //change the 256 to the correct size of string in dataset
      int posDouble2 = pos + 256 + 8;

 //6) convert bytes to values
     string valString = Encoding.UTF8.GetString(data_to_read, posString, 256);
     double valDouble1 = BitConverter.ToDouble(data_to_read, posDouble1);
     double valDouble2 = BitConverter.ToDouble(data_to_read, posDouble2);

//7 And use these values for your csharp lists/arrays...

  }

I did not test this code. 我没有测试此代码。 It was just rewriten from mine for your case. 对于您的情况,它只是从我的手中重写而来的。 Hope this will help. 希望这会有所帮助。

Filip 菲利普

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM