简体   繁体   English

C - 二进制读取,fread 是反转顺序

[英]C - binary reading, fread is inverting the order

fread(cur, 2, 1, fin) fread(cur, 2, 1, 鳍)

I am sure I will feel stupid when I get an answer to this, but what is happening?我相信当我得到这个答案时我会觉得自己很愚蠢,但是发生了什么?

cur is a pointer to a code_cur, a short (2 bytes), fin is a stream open for binary reading. cur 是一个指向 code_cur 的指针,一个短的(2 个字节),fin 是一个为二进制读取打开的流。

If my file is 00101000 01000000如果我的文件是00101000 01000000

what I get in the end is我最终得到的是

code_cur = 01000000 00101000

Why is that?这是为什么? I am not putting any contest yet because the problem really boils down to this (at least for me) unexpected behaviour.我还没有举办任何比赛,因为问题确实归结为这种(至少对我而言)意外的行为。

And, in case this is the norma, how can I obtain the desired effect?而且,如果这是常态,我怎样才能获得预期的效果?

PS聚苯乙烯

I should probably add that, in order to 'view' the bytes, I am printing their integer value.我应该补充一点,为了“查看”字节,我正在打印它们的整数值。

printf("%d\n",code_cur)

I tried it a couple times and it seemed reliable.我试了几次,看起来很可靠。

As others have pointed out you need to learn more on endianness .正如其他人指出的那样,您需要了解更多关于字节序的知识

You don't know it but your file is (luckily) in Network Byte Order (which is Big Endian).您不知道,但您的文件(幸运的是)采用网络字节顺序(Big Endian)。 Your machine is little endian, so a correction is needed.你的机器是小端的,所以需要更正。 Needed or not, this correction is always recommended as this will guarantee that your program runs everywhere.无论是否需要,始终建议进行此更正,因为这将保证您的程序在任何地方运行。

Do somethig similar to this:做类似这样的事情:

{
    uint16_t tmp;

    if (1 == fread(&tmp, 2, 1, fin)) { /* Check fread finished well */
        code_cur = ntohs(tmp);
    } else {
        /* Treat error however you see fit */
        perror("Error reading file");
        exit(EXIT_FAILURE); // requires #include <stdlib.h>
    }
}

ntohs() will convert your value from file order to your machine's order, whatever it is, big or little endian. ntohs()会将您的值从文件顺序转换为您机器的顺序,无论是大端还是小端。

This is why htonl and htons (and friends) exist.这就是 htonl 和 htons(和朋友)存在的原因。 They're not part of the C standard library, but they're available on pretty much every platform that does networking.它们不是 C 标准库的一部分,但几乎可以在每个进行网络连接的平台上使用。

"htonl" means "host to network, long"; “htonl”的意思是“主机到网络,长”; "htons" means "host to network, short". “htons”的意思是“主机到网络,简称”。 In this context, "long" means 32 bits, and "short" means 16 bits (even if the platform declares "long" to be 64 bits).在这种情况下,“长”表示 32 位,“短”表示 16 位(即使平台声明“长”为 64 位)。 Basically, whenever you read something from the "network" (or in your case, the stream you're reading from), you pass it through "ntoh*".基本上,每当您从“网络”(或在您的情况下,您正在阅读的流)中读取某些内容时,您都会通过“ntoh*”传递它。 When you're writing out, you pass it through "hton*"当你写出来时,你通过“hton*”传递它

You can permutate those function names in whatever way you want, except for the silly ones (no, there is no ntons, and no stonl either)你可以用任何你想要的方式排列这些函数名,除了那些愚蠢的(不,没有 nton,也没有 stonl)

As others have pointed out, this is an endianess issue.正如其他人指出的那样,这是一个字节序问题。

The Most Significant Byte differs in your file and your machine.最高有效字节在您的文件和您的机器中有所不同。 Your file has big-endian (MSB first) and your machine is little-endian (MSB last or LSB first).您的文件具有大端(MSB 在前),而您的机器是小端(MSB 在后或 LSB 在前)。

To understand what's happening, let's create a file with some binary data:要了解发生了什么,让我们创建一个包含一些二进制数据的文件:

    uint8_t buffer[2] = {0x28, 0x40}; // hexadecimal for 00101000 01000000
    FILE * fp = fopen("file.bin", "wb"); // opens or creates file as binary
    fwrite(buffer, 1, 2, fp); // write two bytes to file
    fclose(fp);

The file.bin was created and holds the binary value 00101000 01000000, let's read it: file.bin已创建并保存二进制值 00101000 01000000,让我们阅读它:

    uint8_t buffer[2] = {0, 0};
    FILE * fp = fopen("file.bin", "rb");
    fread(buffer, 1, 2, fp); // read two bytes from file
    fclose(fp);
    printf("0x%02x, 0x%02x\n", buffer[0], buffer[1]);
    // The above prints 0x28, 0x40, as expected and in the order we wrote previously

So everything works well because we are reading byte-by-byte and bytes don't have endianess (technically they do, they are always Most Significant Bit first regardless of your machine, but you may think as if they didn't to simplify the understanding).所以一切正常,因为我们正在逐字节读取并且字节没有字节序(从技术上讲,它们总是最重要的位,无论您的机器如何,但您可能会认为它们好像没有简化理解)。

Anyways, as you noticed, here's what happens when you try to read the short directly:无论如何,正如您所注意到的,当您尝试直接阅读短片时会发生以下情况:

    FILE * fp_alt = fopen("file.bin", "rb");
    short incorrect_short = 0;
    fread(&incorrect_short, 1, 2, fp_alt);
    fclose(fp_alt);
    printf("Read short as machine endianess: %hu\n", incorrect_short);
    printf("In hex, that is 0x%04x\n", incorrect_short);
    // We get the incorrect decimal of 16424 and hex of 0x4028!
    // The machine inverted our short because of the way the endianess works internally

The worst part is that if you're using a big-endian machine, the above results would not return incorrect number leaving you unaware that your code is endian-specific and not portable between processors!最糟糕的是,如果您使用的是大端机器,上述结果不会返回不正确的数字,让您不知道您的代码是特定于端的,并且不能在处理器之间移植!

It's nice to use ntohs from arpa/inet.h to convert the endianess, but I find it strange since it's a whole (non-standard) library made for network communication to solve an issue that comes from reading files, and it solves it by reading it incorrectly from the file and then 'translating' the incorrect value instead of just reading it correctly.使用arpa/inet.h ntohs来转换字节序很好,但我觉得很奇怪,因为它是一个完整的(非标准)库,用于网络通信以解决来自读取文件的问题,它通过以下方式解决从文件中错误地读取它,然后“翻译”不正确的值,而不是正确地读取它。

In higher languages we often see functions to handle reading endianess from file instead of converting the value because we (usually) know how a file structure is and its endianess, just look at Javascript Buffer's readInt16BE method , straight to the point and easy to use.在高级语言中,我们经常看到处理从文件中读取字节序而不是转换值的函数,因为我们(通常)知道文件结构及其字节序,只需查看 Javascript Buffer 的readInt16BE 方法,直截了当且易于使用。

Motivated by this simplicity, I created a function that reads a 16-bit integer below (but it's very easy to change to 8, 32 or 64 bits if you need to):受这种简单性的启发,我创建了一个函数,它读取下面的 16 位整数(但如果需要,可以很容易地更改为 8、32 或 64 位):

#include <stdint.h> // necessary for specific int types

// Advances and reads a single signed 16-bit integer from the file descriptor as Big Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16BE(int16_t * result, FILE * f) {
    uint8_t buffer[sizeof(int16_t)];
    if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
        return 0;
    *result = buffer[0] << 8 + buffer[1];
    return 1;
}

Usage is simple (error handling omitted for brevity):用法很简单(为简洁起见省略了错误处理):

    FILE * fp = fopen("file.bin", "rb"); // Open file as binary
    short code_cur = 0;
    freadInt16BE(&code_cur, fp);
    fclose(fp);
    printf("Read Big-Endian (MSB first) short: %hu\n", code_cur);
    printf("In hex, that is 0x%04x\n", code_cur);
    // The above code prints 0x2840 correctly (decimal: 10304)

The function will fail (return 0) if the file either: doesn't exist, can't be open, or did not contain the 2 bytes to be read at the current position.如果文件不存在、无法打开或不包含在当前位置读取的 2 个字节,则该函数将失败(返回 0)。

As a bonus, if you happen to find a file that is little-endian, you can use this function:作为奖励,如果您碰巧找到小端格式的文件,则可以使用此函数:

// Advances and reads a single signed 16-bit integer from the file descriptor as Little Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16LE(int16_t * result, FILE * f) {
    uint8_t buffer[sizeof(int16_t)];
    if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
        return 0;
    *result = buffer[1] << 8 + buffer[0];
    return 1;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM