简体   繁体   English

在C代码中编写和读取long int值

[英]Writing and reading long int value in C code

I'm working on a file format that should be written and read in several different operating systems and computers. 我正在研究一种文件格式,应该在几种不同的操作系统和计算机上编写和读取。 Some of those computers should be x86 machines, others x86-64. 其中一些计算机应该是x86计算机,其他计算机应该是x86-64。 Some other processors may exist, but I'm not concerned about them yet . 其他一些处理器可能存在,但我不关心他们

This file format should contain several numbers that would be read like this: 此文件格式应包含几个可读取的数字,如下所示:

struct LongAsChars{
    char c1, c2, c3, c4;
};

long readLong(FILE* file){
    int b1 = fgetc(file);
    int b2 = fgetc(file);
    int b3 = fgetc(file);
    int b4 = fgetc(file);
    if(b1<0||b2<0||b3<0||b4<0){
        //throwError
    }

    LongAsChars lng;
    lng.c1 = (char) b1;
    lng.c2 = (char) b2;
    lng.c3 = (char) b3;
    lng.c4 = (char) b4;

    long* value = (long*) &lng;

    return *value;
}

and written as: 写作:

void writeLong(long x, FILE* f){
    long* xptr = &x;
    LongAsChars* lng = (LongAsChars*) xptr;
    fputc(lng->c1, f);
    fputc(lng->c2, f);
    fputc(lng->c3, f);
    fputc(lng->c4, f);
}

Although this seems to be working on my computer, I'm concerned that it may not in others or that the file format may end up being different across computers(32 bits vs 64 bits computers, for example). 虽然这似乎在我的计算机上工作,但我担心它可能不在其他人或文件格式可能最终在计算机之间变得不同(例如32位与64位计算机)。 Am I doing something wrong? 难道我做错了什么? How should I implement my code to use a constant number of bytes per number? 我应该如何实现我的代码以使用每个数字的常量字节数?

Should I just use fread(which would possibly make my code faster too) instead? 我应该只使用fread(这可能会使我的代码更快)吗?

Use the types in stdint.h to ensure you get the same number of bytes in and out. 使用stdint.h的类型可确保获得相同的字节数。

Then you're just left with dealing with endianness issues, which you code probably doesn't really handle. 然后你只是处理字节序问题,你可能没有真正处理它。

Serializing the long with an aliased char* leaves you with different byte orders in the written file for platforms with different endianess. 使用别名char *序列化long会在写入的文件中为不同的字节顺序留下不同的字节顺序。

You should decompose the bytes something like so: 你应该像这样分解字节:

char c1 = (val >>  0) & 0xff;
char c2 = (val >>  8) & 0xff;
char c3 = (val >> 16) & 0xff;
char c4 = (val >> 24) & 0xff;

And recompose then using something like: 然后使用类似的东西重新组合:

val = (c4 << 24) |
      (c3 << 16) |
      (c2 <<  8) |
      (c1 <<  0);

You might also run into issues with endianness . 您可能还会遇到有关字节序的问题。 Why not just use something like NetCDF or HDF , which take care of any portability issues that may arise? 为什么不使用像NetCDFHDF这样的东西来处理可能出现的任何可移植性问题?

Rather than using structures with characters in them, consider a more mathematical approach: 不要使用带有字符的结构,而是考虑更多的数学方法:

long l  = fgetc() << 24;
     l |= fgetc() << 16;
     l |= fgetc() <<  8;
     l |= fgetc() <<  0;

This is a little more direct and clear about what you are trying to accomplish. 对于你想要完成的事情,这是更直接和明确的。 It can also be implemented in a loop to handle larger numbers. 它也可以在循环中实现以处理更大的数字。

You don't want to use long int. 你不想使用long int。 That can be different sizes on different platforms, so is a non-starter for a platform-independent format. 这可以是不同平台上的不同大小,因此对于独立于平台的格式而言是非首发。 You have to decide what range of values needs to be stored in the file. 您必须确定需要在文件中存储的值范围。 32 bits is probably easiest. 32位可能是最简单的。

You say you aren't worried about other platforms yet . 你说你是不是担心其他平台 I'll take that to mean you want to retain the possibility of supporting them, in which case you should define the byte-order of your file format. 我会认为你想保留支持它们的可能性,在这种情况下你应该定义文件格式的字节顺序。 x86 is little-endian, so you might think that's the best. x86是little-endian,所以你可能认为这是最好的。 But big-endian is the "standard" interchange order if anything is, since it's used in networking. 但是big-endian是“标准”交换顺序,如果有的话,因为它用于网络。

If you go for big-endian ("network byte order"): 如果你去big-endian(“网络字节顺序”):

// can't be bothered to support really crazy platforms: it is in
// any case difficult even to exchange files with 9-bit machines,
// so we'll cross that bridge if we come to it.
assert(CHAR_BIT == 8);
assert(sizeof(uint32_t) == 4);

{
    // write value
    uint32_t value = 23;
    const uint32_t networkOrderValue = htonl(value);
    fwrite(&networkOrderValue, sizeof(uint32_t), 1, file);
}

{
    // read value
    uint32_t networkOrderValue;
    fread(&networkOrderValue, sizeof(uint32_t), 1, file);
    uint32_t value = ntohl(networkOrderValue);
}

Actually, you don't even need to declare two variables, it's just a bit confusing to replace "value" with its network order equivalent in the same variable. 实际上,你甚至不需要声明两个变量,将“value”替换为在同一个变量中等效的网络顺序只是有点混乱。

It works because "network byte order" is defined to be whatever arrangement of bits results in an interchangeable (big-endian) order in memory. 它的工作原理是因为“网络字节顺序”被定义为任何位的排列导致内存中的可互换(大端)顺序。 No need to mess with unions because any stored object in C can be treated as a sequence of char. 不需要乱用联合,因为C中的任何存储对象都可以被视为一系列char。 No need to special-case for endianness because that's what ntohl/htonl are for. 无需特殊情况下的字节序,因为这就是ntohl / htonl的用途。

If this is too slow, you can start thinking about fiendishly optimised platform-specific byte-swapping, with SIMD or whatever. 如果这太慢了,你可以开始考虑使用SIMD或其他方法进行极端优化的平台特定字节交换。 Or using little-endian, on the assumption that most of your platforms will be little-endian and so it's faster "on average" across them. 或者使用little-endian,假设您的大多数平台都是little-endian,因此它们的“平均”速度更快。 In that case you'll need to write or find "host to little-endian" and "little-endian to host" functions, which of course on x86 just do nothing. 在这种情况下,你需要编写或找到“host-little-endian”和“little-endian to host”函数,当然这些函数在x86上什么都不做。

I believe the most cross architecture approach is to use the uintXX_t types, as defined in stdint.h. 我相信最交叉架构的方法是使用udXX_t类型,如stdint.h中所定义。 See man page here. 请参见此处的手册页。 For example a int32_t will give you a 32 bit integer on x86 and x86-64. 例如,int32_t将在x86和x86-64上为您提供32位整数。 I use these by default now in all of my code and have had no troubles, as they are fairly standard across all *NIX. 我现在在我的所有代码中默认使用这些并且没有任何麻烦,因为它们在所有* NIX中都是相当标准的。

Assuming sizeof(uint32_t) == 4 , there are 4!=24 possible byte orders, of which little-endian and big-endian are the most prominent examples, but others have been used as well (eg PDP-endian). 假设sizeof(uint32_t) == 4 ,则有4!=24可能的字节顺序,其中little-endian和big-endian是最突出的例子,但也使用了其他的(例如PDP-endian)。

Here are functions for reading and writing 32 bit unsigned integers from a stream, heeding an arbitrary byte order which is specified by the integer whose representation is the byte sequence 0,1,2,3 : endian.h , endian.c 以下是从流中读取和写入32位无符号整数的函数,注意由整数指定的任意字节顺序,其表示形式为字节序列0,1,2,3endian.hendian.c

The header defines these prototypes 标题定义了这些原型

_Bool read_uint32(uint32_t * value, FILE * file, uint32_t order);
_Bool write_uint32(uint32_t value, FILE * file, uint32_t order);

and these constants 和这些常数

LITTLE_ENDIAN
BIG_ENDIAN
PDP_ENDIAN
HOST_ORDER

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM