C++ 二进制文件无法正确读取

Question

I am reading a file that is written in high endian on a little endian intel processor in c++.我正在阅读一个文件，该文件是在 C++ 中的小端英特尔处理器上以高端方式编写的。 The file is a generic file written in binary.该文件是用二进制编写的通用文件。 I have tried reading it using open() and fopen() both but they both seem to get the same thing wrong.我曾尝试使用 open() 和 fopen() 阅读它，但他们似乎都犯了同样的错误。 The file is a binary file for training images from the MNIST dataset.该文件是用于训练来自 MNIST 数据集的图像的二进制文件。 It contains 4 headers, each 32 bits in size and stored in high endian.它包含 4 个标头，每个标头大小为 32 位，并以高位序存储。 My code is working, it is just not giving the right value for the 2nd header.我的代码正在运行，只是没有为第二个标题提供正确的值。 It works for the rest of the headers.它适用于其余的标题。 I even opened the file in a hex editor to see if the value might be wrong but it is right.我什至在十六进制编辑器中打开了该文件，以查看该值是否可能是错误的，但它是正确的。 The program, for some weird reason, reads only the value of the second header wrong: Here is the code that deals with reading the headers only:由于某种奇怪的原因，该程序只读取了第二个标头的值错误：这是处理仅读取标头的代码：

void DataHandler::readInputData(std::string path){
    uint32_t headers[4];
    char bytes[4];
    std::ifstream file;
    //I tried both open() and fopen() as seen below
    file.open(path.c_str(), std::ios::binary | std::ios::in);
    //FILE* f = fopen(path.c_str(), "rb");
    if (file)
    {
        int i = 0;
        while (i < 4)//4 headers
        {
            //if (fread(bytes, sizeof(bytes), 1, f))
            //{
            //    headers[i] = format(bytes);
            //    ++i;
            //}
            file.read(bytes, sizeof(bytes));
            headers[i++] = format(bytes);
        }
        printf("Done getting images file header.\n");
        printf("magic: 0x%08x\n", headers[0]);
        printf("nImages: 0x%08x\n", headers[1]);//THIS IS THE ONE THAT IS GETTING READ WRONG
        printf("rows: 0x%08x\n", headers[2]);
        printf("cols: 0x%08x\n", headers[3]);
        exit(1);
        //reading rest of the file code here
    }
    else
    {
        printf("Invalid Input File Path\n");
        exit(1);
    }
}

//converts high endian to little indian (required for Intel Processors)
uint32_t DataHandler::format(const char * bytes) const
{
    return (uint32_t)((bytes[0] << 24) |
        (bytes[1] << 16) |
        (bytes[2] << 8) |
        (bytes[3]));
}

Output I am getting is:我得到的输出是：

Done getting images file header.
magic: 0x00000803
nImages: 0xffffea60
rows: 0x0000001c
cols: 0x0000001c

nImages should be 60,000 or (0000ea60)h in hex but it is reading it as ffff... for some reason. nImages 应该是 60,000 或 (0000ea60)h 的十六进制，但它正在读取它作为 ffff ......出于某种原因。 Here is the file opened in a hex editor:这是在十六进制编辑器中打开的文件： As we can see, the 2nd 32 bit number is 0000ea60 but it is reading it wrong...正如我们所看到的，第二个 32 位数字是 0000ea60 但它读错了......

Answer 1

It seems that char is signed in your environment and therefore 0xEA in the data is sign-extended to 0xFFFFFFEA .似乎char在您的环境中已签名，因此数据中的0xEA被符号扩展为0xFFFFFFEA 。 This will break the higher digits.这将打破较高的数字。

To prevent this, you should use unsigned char instead of char .为了防止这种情况，您应该使用unsigned char而不是char 。 (for both of element type of bytes and the argument of format() ) （对于bytes的元素类型和format()的参数）

C++ 二进制文件无法正确读取

问题描述

1 个解决方案

解决方案1
2 2020-08-27 12:50:57

C++ 二进制文件无法正确读取

问题描述

1 个解决方案

解决方案1 2 2020-08-27 12:50:57

解决方案1
2 2020-08-27 12:50:57