简体   繁体   English

在 C++ 中使用 header 读取二进制文件矩阵(这是 fortran 输出)

[英]Read binary file matrix with header in C++ (which is fortran output)

I have a binary file that has something like a header, and then it's proceeded by the actual matrix.我有一个类似于 header 的二进制文件,然后由实际矩阵处理。 Actually, the binary file is a fortran output, structured like:实际上,二进制文件是 fortran output,结构如下:

      OPEN(UNIT=IUN43,..., FORM='UNFORMATTED',ACCESS='DIRECT',RECL=IRETLEN) 
      WRITE(IUN43,REC=1) "Description text"
      WRITE(IUN43,REC=2) IRECL, ICOLS, IROWS
      WRITE(IUN43,REC=3) I1,I2
      WRITE(IUN43,REC=4) DLAT, DLON, DH
      WRITE(IUN43,REC=5) ITY,ITM,ITD
      WRITE(IUN43,REC=6) I3,I4,I5
      WRITE(IUN43,REC=7) DF
C Loop over IRETREC     
      WRITE(IUN43,17023,REC=IRETREC) I1,I2,D1,D2,D3,...

everything I is integer, D is double. I的一切都是 integer, D是双倍的。 It's very important legacy code, there's nothing I can do about it.这是非常重要的遗留代码,我对此无能为力。 REC is different line in the output file. REC是 output 文件中的不同行。 My only experience with binary files is with fortran.我对二进制文件的唯一经验是 fortran。

I try to read this in my c++ program.我尝试在我的 c++ 程序中阅读此内容。 After reading tutorials and answers on Stackoverflow, I finally have something like:在阅读了 Stackoverflow 上的教程和答案后,我终于有了类似的东西:

      std::ifstream ifile(TimeSeriesDirectory1+TimeSeriesFileName1, std::ios::binary);
      if (ifile.good())
      {
      // get length of file:
      ifile.seekg (0, ifile.end);
      int filelength = ifile.tellg();
      ifile.seekg (0, ifile.beg);
      char * buffer = new char [filelength];

      std::cout << "Reading " << filelength << " characters...\n";
      std::string line;
      while (std::getline(ifile, line))
      {
       std::istringstream iss(line);
       std::cout << line.substr(0,20) <<std::endl;
       //Only 20 characters to see how and if it works
      };

I tried with ifile.read (line,filelength);我尝试使用ifile.read (line,filelength); and line=const_cast<char*>(buffer);line=const_cast<char*>(buffer); but then the code reads only 1 line, it appears.但随后代码仅读取 1 行,它出现了。

My intention is that the code reads the header lines (1-7) that will be copied later on to the output.我的意图是代码读取 header 行 (1-7),稍后将复制到 output。 It could read IRECL, ICOLS, IROWS from the header, since the latter two are the numbers of columns and rows.它可以从 header 中读取IRECL, ICOLS, IROWS ,因为后两者是列数和行数。 Currently I'm stuck with just reading the data, since the output is:目前我坚持只阅读数据,因为 output 是:

    Description text
    ^@^@^@^E^@^@^@y^p2^A^A^@^@^@
    :T   ^b      ^F^\^H         
    ^YN^u4@^xj^#?   ^s?A   |^p^o

etc.等等

I learnt, that if you know that the file is fixed-size number of doubles, you could use buffer for it.我了解到,如果您知道该文件是固定大小的双打数,则可以使用缓冲区。 But I don't now columns nor rows before running the program.但在运行程序之前,我现在既不列也不行。 Also, the idea of having header is very important.此外,拥有 header 的想法非常重要。

I already have this solved using stringstream (see below for the reference), but it results in 20G-60G matrices, so anything to speed up the process and save the disk space would be appreciated.我已经使用 stringstream 解决了这个问题(请参阅下面的参考资料),但它会产生 20G-60G 矩阵,因此任何可以加快处理速度和节省磁盘空间的方法都将不胜感激。 I use linux (g++ (SUSE Linux) 7.5.0, gfortran GNU Fortran (SUSE Linux) 7.5.0).我使用 linux (g++ (SUSE Linux) 7.5.0, gfortran GNU Fortran (SUSE Linux) 7.5.0)。 It came to my mind that maybe I could use casting operators, but it appeared too confusing and I couldn't find out how to do it.我想到也许我可以使用强制转换运算符,但它看起来太混乱了,我不知道该怎么做。

    std::stringstream ss(line);
    while (ss>>number) 
    {
      ++columns;
      LineOfTimeSeriesFile.push_back(number);
    };

Any help is appreciated.任何帮助表示赞赏。

Fun problem.有趣的问题。 Hope this helps.希望这可以帮助。

Two approaches are below.下面有两种方法。 The first is brute force reading the file.首先是蛮力读取文件。 The second approach is helpful if you have fortran code that reads the file.如果您有读取文件的 fortran 代码,则第二种方法很有帮助。

Approach 1方法一


#include<iostream>
#include<fstream>

using namespace std;

#define F90_REAL_SIZE (sizeof(float))
float read_fortran_real( ifstream &stream )
{
    float tmp = 0.0;
    stream.read( (char*)&tmp, F90_REAL_SIZE );
    return tmp;
}

#define F90_INT_SIZE (sizeof(int))
int read_fortran_int( ifstream &stream )
{
    int tmp = 0;
    stream.read( (char*)&tmp, F90_INT_SIZE );
    return tmp;
}

int main()
{
    int iretlen = 0;
    ifstream stream( "data.raw", ios::out | ios::binary );
    if(!stream) {
        cout << "Cannot open file!" << endl;
        return 1;
    }

    string hdrstr;
    getline( stream, hdrstr, '\0' );
    cout<<"HEADER:"<<hdrstr<<endl;
    iretlen = hdrstr.length() + 1;
    while( stream.peek() == '\0' )
    {
        iretlen++;
        stream.get( );
    }
    cout<<"IRETLEN:"<<iretlen<<endl;

    int irecl = read_fortran_int( stream );
    int icols = read_fortran_int( stream );
    int irows = read_fortran_int( stream );
    stream.seekg( iretlen - 12, ios_base::cur );

    int i1 = read_fortran_int( stream );
    int i2 = read_fortran_int( stream );
    stream.seekg( iretlen - 8, ios_base::cur );

    float dlat = read_fortran_real( stream );
    float dlon = read_fortran_real( stream );
    float  dh = read_fortran_real( stream );
    stream.seekg( iretlen - 12, ios_base::cur );

    int ity = read_fortran_int( stream );
    int itm = read_fortran_int( stream );
    int itd = read_fortran_int( stream );
    stream.seekg( iretlen - 12, ios_base::cur );

    int  i3 = read_fortran_int( stream );
    int  i4 = read_fortran_int( stream );
    int  i5 = read_fortran_int( stream );
    stream.seekg( iretlen - 12, ios_base::cur );

    float  df = read_fortran_real( stream );
    stream.seekg( iretlen - 4, ios_base::cur );

    for ( int i = 0 ; i < 2 ; i++ )
    {
        int i1 = read_fortran_int( stream );
        int i2 = read_fortran_int( stream );
        float d1 = read_fortran_real( stream );
        float d2 = read_fortran_real( stream );
        float d3 = read_fortran_real( stream );
        // 20 bytes exactly
        stream.seekg( iretlen - 20, ios_base::cur );
    }

    stream.close();

    cout<<"IRECL:"<<irecl<<endl;
    cout<<"ICOLS:"<<icols<<endl;
    cout<<"IROWS:"<<irows<<endl;
    cout<<"DLAT:"<<dlat<<endl;
    cout<<"DLON:"<<dlon<<endl;
    cout<<"DH:"<<dh<<endl;

    return 0;
}

Approach 2方法二

Use existing code.使用现有代码。


// FILE: try6C.cpp
#include <iostream>

using namespace std;

extern"C" {
void read_header_( char* HDRSTR, int* ICOLS, int* IROWS, int* I1, int* I2, float* DLAT, float* DLON, float* DH, int* ITY, int* ITM, int* ITD, int* I3, int* I4, int* I5, float* DF, float* D1, float* D2, float* D3 );
}

int main()
{
   char HDRSTR[17] = { 0 };
   int ICOLS, IROWS, I1, I2;
   float DLAT, DLON, DH;
   int ITY, ITM, ITD, I3, I4, I5;
   float DF, D1, D2, D3;

   read_header_( HDRSTR, &ICOLS, &IROWS, &I1, &I2, &DLAT, &DLON, &DH, &ITY, &ITM, &ITD, &I3, &I4, &I5, &DF, &D1, &D2, &D3 );

   printf( "%s\n", HDRSTR );
   printf( "%d\n", ICOLS );
   printf( "%d\n", IROWS );

   return 0;
}

! FILE: try6F.f90
subroutine read_header( HDRSTR, ICOLS, IROWS, I1, I2, DLAT, DLON, DH, ITY, ITM, ITD, I3, I4, I5, DF, D1, D2, D3 )
  integer :: IUN43
  integer :: IRECL
  integer, intent(out) :: ICOLS, IROWS
  integer, intent(out) :: I1, I2
  real, intent(out) :: DLAT, DLON, DH
  integer, intent(out) :: ITY, ITM, ITD
  integer, intent(out) :: I3, I4, I5
  real, intent(out) :: DF
  real, intent(out) :: D1, D2, D3
  character(16), intent(out) :: HDRSTR
  IRECL=20
  OPEN( NEWUNIT=IUN43, FILE='data.db', ACCESS='DIRECT', RECL=IRECL, FORM='UNFORMATTED', STATUS='UNKNOWN' ) 
  READ(IUN43,REC=1) HDRSTR
  READ(IUN43,REC=2) IRECL, ICOLS, IROWS
  READ(IUN43,REC=3) I1,I2
  READ(IUN43,REC=4) DLAT, DLON, DH
  READ(IUN43,REC=5) ITY,ITM,ITD
  READ(IUN43,REC=6) I3,I4,I5
  READ(IUN43,REC=7) DF
  close(IUN43)
end subroutine
gfortran -c try6F.f90
g++ -c try6C.cpp 
g++ -o try6 try6C.o try6F.o -lgfortran

Something Extra额外的东西

A fortran program to create raw binary output.用于创建原始二进制 output 的 fortran 程序。

program hello
  integer :: ILOOP
  integer :: IUN43
  integer :: IRECL, ICOLS, IROWS
  integer :: I1, I2
  real :: DLAT, DLON, DH
  integer :: ITY, ITM, ITD
  integer :: I3, I4, I5
  real :: DF
  real :: D1, D2, D3
  print *, 'Hello, World!'
  IRECL=20
  ICOLS=4
  IROWS=5
  DLAT=1.2
  DLON=3.4
  DH=5.6
  ITY=6
  ITM=7
  ITD=8
  I1=11
  I2=12
  I3=13
  I4=14
  I5=15
  DF=7.8
  D1=10.1
  D2=10.2
  D3=10.3
  OPEN( NEWUNIT=IUN43, FILE='data.db', ACCESS='DIRECT', RECL=IRECL, FORM='UNFORMATTED', STATUS='NEW' ) 
  WRITE(IUN43,REC=1) "Description texta"
  WRITE(IUN43,REC=2) IRECL, ICOLS, IROWS
  WRITE(IUN43,REC=3) I1,I2
  WRITE(IUN43,REC=4) DLAT, DLON, DH
  WRITE(IUN43,REC=5) ITY,ITM,ITD
  WRITE(IUN43,REC=6) I3,I4,I5
  WRITE(IUN43,REC=7) DF
  do ILOOP=8,10
      WRITE(IUN43,REC=ILOOP) I1,I2,D1,D2,D3
  end do 
  close(IUN43)
end program hello

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM