使用C / C ++解析二进制文件

Question

I am currently dealing with some binary files and the data format is specified in some file. 我目前正在处理一些二进制文件，并且在某些文件中指定了数据格式。 The way I am using is define some corresponding struct to read in data one by one from the buffer. 我使用的方式是定义一些相应的结构，以便从缓冲区中逐一读取数据。 For example, if I know in the beginning of the file, there is a packet header specifying the following data type and data length in this packet, I will first parse this packet header. 例如，如果我知道在文件的开头，有一个包头指定此包中的以下数据类型和数据长度，那么我将首先解析该包头。

  #pragma pack (1)
  struct PacketHeader
  {
     uint16_t PacketSize;
     uint16_t PacketType;
  };
  char *buffer = new char[size];
  file.read(buffer, size);
  //read in the PacketHeader
  PacketHeader ph;
  ph = *(PacketHeader *)buffer;
  //switch data type
  switch(ph.PacketType)
  {
     //do something
  }

So far, everything goes well, but problems occur when I donot use the struct method. 到目前为止，一切顺利，但是当我不使用struct方法时会出现问题。 For example, I know that at some position of the buffer following datatype A , there will be some information about the underlying composition of A , say one uint32_t variable and the other uint32_t variable. 例如，我知道在数据类型A之后的缓冲区的某个位置，将有一些有关A的基础组成的信息，例如一个uint32_t变量和另一个uint32_t变量。 The number of such variable pairs is specified in A , and as there are just two variables in this pair, I tried to just parse them directly without any structs, eg, 这样的变量对的数量在A指定，并且由于该对中只有两个变量，因此我尝试直接解析它们而没有任何结构，例如，

  //get pair_num from previous parsed data
  for(int i = 0; i < pair_num; i++)
  {
      std::cout << (uint32_t)(buffer + 2 * i * sizeof(uint32_t))
                << (uint32_t)(buffer + (2 * i + 1) * sizeof(uint32_t))
                << std::endl;
  }

However, the above code does not work. 但是，以上代码不起作用。 The two variables parsed from the files are wrong. 从文件解析的两个变量是错误的。 So I turned back to the structure method and managed to get the correct result with the following codes: 因此，我回到了结构方法，并使用以下代码设法获得了正确的结果：

 struct B
 {
    uint32_t v1;
    uint32_t v2;
 }
 B b;
 //get pair_num from previous parsed data
 for(int i = 0; i < pair_num; i++)
 {
     b = *(B *)(buffer + i * 8);
     std::cout << b.v1
               << b.v2
               << std::endl;
 }

I am just wondering what is the difference between these two methods? 我只是想知道这两种方法有什么区别？ Anyone could give me some insights? 有人可以给我一些见解吗？

Answer 1

Instead of converting char * to uint32_t * and dereferencing it you converting char * pointer to uint32_t so printing address on std::cout (or part of address on 64bit machine), correct way should be this: 代替将char *转换为uint32_t *并对其取消引用，您可以将char *指针转换为uint32_t以便在std::cout （或64bit机器上的部分地址）上打印地址，正确的方法应该是：

  for(int i = 0; i < pair_num; i++)
  {
      std::cout << *((uint32_t *)(buffer + 2 * i * sizeof(uint32_t)))
                << *((uint32_t *)(buffer + (2 * i + 1) * sizeof(uint32_t)))
                << std::endl;
  }

but way simpler would be to use proper pointer: 但更简单的方法是使用适当的指针：

  uint32_t *ptr = (uint32_t *) buffer;
  for(int i = 0; i < pair_num; i++)
  {
      std::cout << *(ptr + 2 * i )
                << *(ptr + 2 * i + 1 )
                << std::endl;
  }

or even: 甚至：

  uint32_t *ptr = (uint32_t *) buffer;
  for(int i = 0; i < pair_num; i++)
  {
      std::cout << ptr[2 * i]
                << ptr[2 * i + 1]
                << std::endl;
  }

Note that on the second way (as well as using header as struct) you are doing unnecessary copy, I think what you wanted is this instead: 请注意，在第二种方式（以及将标头用作结构）上，您正在执行不必要的复制，我想您想要的是：

struct B
 {
    uint32_t v1;
    uint32_t v2;
 }
 //get pair_num from previous parsed data
 for(int i = 0; i < pair_num; i++)
 {
     B *b = (B *)(buffer + i * 8);
     std::cout << b->v1
               << b->v2
               << std::endl;
 }

unless you intentionally copy that structure for whatever reason. 除非您出于任何原因有意复制该结构。

使用C / C ++解析二进制文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-12-12 03:15:10

使用C / C ++解析二进制文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-12-12 03:15:10

解决方案1
1 已采纳 2014-12-12 03:15:10