[英]Reading int from binary file incorrect when using c++
I have a complex structured binary file. 我有一个复杂的结构化二进制文件。 I created a parser in python to read the binary file and convert to the correct values and save the data to csv so that the values can be analyzed.
我在python中创建了一个解析器,以读取二进制文件并将其转换为正确的值,然后将数据保存到csv,以便可以分析这些值。 This works well but some of the files are extremely large (ie 20+ Gb) and take many many hours to parse.
这很好用,但是某些文件非常大(即20+ Gb),并且要花很多小时才能解析。 I am trying to speed this up by implementing the same process in c++.
我试图通过在c ++中实现相同的过程来加快速度。
Below is an excerpt that reads a control word at the beginning of each logical record and specifies the size of the record. 下面的摘录在每个逻辑记录的开头读取一个控制字并指定记录的大小。 For a specific case the control word is 128 (a 4 byte, Big Endian, int).
对于特定情况,控制字为128(4字节,Big Endian,int)。 In python I do:
在python中,我这样做:
x = open(str(self.filename), "rb")
cw_d_type = np.dtype('>i4')
temp = np.frombuffer(x.read(cw_d_type.itemsize), dtype=cw_d_type)
The value in temp[0] after this is 128. Now when I attempt to do this same thing in c++ using the following code temp [0]之后的值是128。现在,当我尝试使用以下代码在c ++中执行相同的操作时
#include <iostream>
#include <fstream>
#include <stdio.h>
#include <sstream>
#include <stdint.h>
using namespace std
struct control_word
{
uint32_t chunk_size;
}
int main()
{
// define my stream
ifstream in_f("Y:/path_to_binary_file/binary_file", ios::binary | ios::in | ios::ate);
// find the size of the file
int file_size = in_f.tellg();
// goto the beginning of the file
in_f.seekg(0, std::ios::beg);
in_f.read(reinterpret_cast<char*>(&cw), sizeof(cw));
cout << cw.chunk_size << endl
... continue reading the rest of the structures
}
The result for cw.chunk_size = 2147483648. I know that I am reading the correct place in the file b/c the next structure that I read has a 32 bit string and it is being read properly, if I was not in the right location in the file then that result would not be correct. cw.chunk_size = 2147483648的结果。我知道我正在读取文件b / c中的正确位置,如果我不在正确的位置,则我读取的下一个结构具有32位字符串,并且可以正确读取它。在文件中,那么结果将不正确。
If I change my control word structure from an int
to a char[4]
then the result is [0][0][0][-128]
which is almost correct except the negative sign is there. 如果我将控制字的结构从
int
更改为char[4]
则结果为[0][0][0][-128]
,除负号外几乎都是正确的。
All of the doubles and floats that I read in are showing the same thing. 我读过的所有双打和浮球都显示相同的内容。 The only thing that seems to be read properly is
char
values. 似乎正确读取的唯一内容是
char
值。 It has been a number of years since I last programmed in c++. 自从我上一次使用c ++编程以来已经有很多年了。 Is there something that I am forgetting to do to properly map my binary into my structures ??
有什么我忘记做的事情可以将我的二进制文件正确映射到我的结构中吗?
I have read many questions concerning reading binary files and can't figure out why I am getting these weird values. 我已经阅读了许多有关读取二进制文件的问题,无法弄清楚为什么我得到这些奇怪的值。 The closest answer that I have found is here , and the solution was the user was not mapping the chunk of binary into the correct type.
我找到的最接近的答案是在这里 ,解决方案是用户没有将二进制代码块映射到正确的类型。 I know that this is not the case for me b/c in my python implementation I read the chunk as an
int
and get the value I am expecting. 我知道在我的python实现中对b / c来说不是这种情况,我将块读取为
int
并获得了我期望的值。
According to documentation > for numpy.dtype specifies Big-Endian format. 根据文档 >对于numpy.dtype指定Big-Endian格式。 You are running your code most probably on Intel or compatible CPU which is Little-Endian.
您很可能在Intel或兼容的Little-Endian CPU上运行代码。 You need to convert your
uint32_t
field using ntohl()
function: 您需要使用
ntohl()
函数转换uint32_t
字段:
in_f.read(reinterpret_cast<char*>(&cw), sizeof(cw));
cw.chunk_size = ntohl( cw.chunk_size );
cout << cw.chunk_size << endl;
Details about Endianness 有关字节序的详细信息
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.