[英]Parsing binary files using c/c++
I am currently dealing with some binary files and the data format is specified in some file. 我目前正在处理一些二进制文件,并且在某些文件中指定了数据格式。 The way I am using is define some corresponding struct to read in data one by one from the buffer.
我使用的方式是定义一些相应的结构,以便从缓冲区中逐一读取数据。 For example, if I know in the beginning of the file, there is a packet header specifying the following data type and data length in this packet, I will first parse this packet header.
例如,如果我知道在文件的开头,有一个包头指定此包中的以下数据类型和数据长度,那么我将首先解析该包头。
#pragma pack (1)
struct PacketHeader
{
uint16_t PacketSize;
uint16_t PacketType;
};
char *buffer = new char[size];
file.read(buffer, size);
//read in the PacketHeader
PacketHeader ph;
ph = *(PacketHeader *)buffer;
//switch data type
switch(ph.PacketType)
{
//do something
}
So far, everything goes well, but problems occur when I donot use the struct method. 到目前为止,一切顺利,但是当我不使用struct方法时会出现问题。 For example, I know that at some position of the buffer following datatype
A
, there will be some information about the underlying composition of A
, say one uint32_t
variable and the other uint32_t
variable. 例如,我知道在数据类型
A
之后的缓冲区的某个位置,将有一些有关A
的基础组成的信息,例如一个uint32_t
变量和另一个uint32_t
变量。 The number of such variable pairs is specified in A
, and as there are just two variables in this pair, I tried to just parse them directly without any structs, eg, 这样的变量对的数量在
A
指定,并且由于该对中只有两个变量,因此我尝试直接解析它们而没有任何结构,例如,
//get pair_num from previous parsed data
for(int i = 0; i < pair_num; i++)
{
std::cout << (uint32_t)(buffer + 2 * i * sizeof(uint32_t))
<< (uint32_t)(buffer + (2 * i + 1) * sizeof(uint32_t))
<< std::endl;
}
However, the above code does not work. 但是,以上代码不起作用。 The two variables parsed from the files are wrong.
从文件解析的两个变量是错误的。 So I turned back to the structure method and managed to get the correct result with the following codes:
因此,我回到了结构方法,并使用以下代码设法获得了正确的结果:
struct B
{
uint32_t v1;
uint32_t v2;
}
B b;
//get pair_num from previous parsed data
for(int i = 0; i < pair_num; i++)
{
b = *(B *)(buffer + i * 8);
std::cout << b.v1
<< b.v2
<< std::endl;
}
I am just wondering what is the difference between these two methods? 我只是想知道这两种方法有什么区别? Anyone could give me some insights?
有人可以给我一些见解吗?
Instead of converting char *
to uint32_t *
and dereferencing it you converting char *
pointer to uint32_t
so printing address on std::cout
(or part of address on 64bit machine), correct way should be this: 代替将
char *
转换为uint32_t *
并对其取消引用,您可以将char *
指针转换为uint32_t
以便在std::cout
(或64bit机器上的部分地址)上打印地址,正确的方法应该是:
for(int i = 0; i < pair_num; i++)
{
std::cout << *((uint32_t *)(buffer + 2 * i * sizeof(uint32_t)))
<< *((uint32_t *)(buffer + (2 * i + 1) * sizeof(uint32_t)))
<< std::endl;
}
but way simpler would be to use proper pointer: 但更简单的方法是使用适当的指针:
uint32_t *ptr = (uint32_t *) buffer;
for(int i = 0; i < pair_num; i++)
{
std::cout << *(ptr + 2 * i )
<< *(ptr + 2 * i + 1 )
<< std::endl;
}
or even: 甚至:
uint32_t *ptr = (uint32_t *) buffer;
for(int i = 0; i < pair_num; i++)
{
std::cout << ptr[2 * i]
<< ptr[2 * i + 1]
<< std::endl;
}
Note that on the second way (as well as using header as struct) you are doing unnecessary copy, I think what you wanted is this instead: 请注意,在第二种方式(以及将标头用作结构)上,您正在执行不必要的复制,我想您想要的是:
struct B
{
uint32_t v1;
uint32_t v2;
}
//get pair_num from previous parsed data
for(int i = 0; i < pair_num; i++)
{
B *b = (B *)(buffer + i * 8);
std::cout << b->v1
<< b->v2
<< std::endl;
}
unless you intentionally copy that structure for whatever reason. 除非您出于任何原因有意复制该结构。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.