简体   繁体   English

浮点二进制可移植性

[英]floating point binary portability

I have a library that saves to disk loads of floating point data in text form. 我有一个库,以文本形式将浮点数据的负载保存到磁盘上。 It seems they've done this because of portability matters, but because of huge disk usage from this, I've written a function to save the binary representation of floating points directly to disk. 似乎由于可移植性问题,他们这样做了,但是由于由此导致的大量磁盘使用,我编写了一个函数,将浮点数的二进制表示形式直接保存到磁盘。 I know this doesn't guarantee 100% portability, but I'll run this only on x86(_64) Linux/Windows PC's (maybe also in Mac and BSDs). 我知道这不能保证100%可移植性,但是我只能在x86(_64)Linux / Windows PC(也许在Mac和BSD中)上运行。

Is there a way to at least check whether the floating point format the program understands is also okay with the system? 有没有一种方法至少可以检查程序理解的浮点格式是否也适用于系统? And how much of incompatibility should I expect from dealing with floating point data in binary form? 从二进制形式的浮点数据处理中,我应该期待多少不兼容?

Is there a way to at least check whether the floating point format the program understands is also okay with the system? 有没有一种方法至少可以检查程序理解的浮点格式是否也适用于系统?

Test 1: sizeof. 测试1:sizeof。 Test 2: save a magic floating point value in the header of your on-disk file and check in the program that it has the right value after you've read the binary data from the disk. 测试2:从磁盘上读取二进制数据后,将魔术浮点值保存在磁盘文件的标头中,并在程序中检查它是否具有正确的值。 This should be safe enough. 这应该足够安全。

And how much of incompatibility should I expect from dealing with floating point data in binary form? 从二进制形式的浮点数据处理中,我应该期待多少不兼容?

Very little. 很少 If, as you're saying, you're staying with just one hardware architecture (x86), you'll be fine. 如您所说,如果仅使用一种硬件体系结构(x86),那您会很好的。 If you have a limited set of supported architectures - just test all of them. 如果受支持的体系结构有限,则只需测试所有体系结构即可。 On x86 everyone will be using hardware floating point which limits how creative they can be (pretty much not at all). 在x86上,每个人都将使用硬件浮点数,这限制了它们的创造力(几乎没有)。 Even between architectures everyone I know of who uses IEEE 754 floating point has the same binary representation for the same endianness. 即使在架构之间,我认识的每个使用IEEE 754浮点的人也具有相同的二进制表示形式和相同的字节序。

Floating point have the weird problem that there isn't a widely used standard for their binary on disk/on wire representation. 浮点数存在一个奇怪的问题,即在磁盘/在线表示形式上的二进制没有广泛使用的标准。 That being said, everyone who I've looked at does one of two things: either strings or store the bit pattern in an equally sized integer, adjust for endianness, brutally cast to float. 话虽这么说,我看过的每个人都做以下两件事之一:字符串或将位模式存储在大小相等的整数中,调整字节序,残酷地转换为浮点数。

Look up the binary portability website. 查找二进制可移植性网站。 https://github.com/MalcolmMcLean/ieee754 https://github.com/MalcolmMcLean/ieee754

The function to write an IEEE 754 portably is quite long, but it's just a cut and paste job. 可移植地编写IEEE 754的功能相当长,但这只是剪切和粘贴工作。 There's also a float version. 还有一个浮动版本。

/*
* write a double to a stream in ieee754 format regardless of host
*  encoding.
*  x - number to write
*  fp - the stream
*  bigendian - set to write big bytes first, elee write litle bytes
*              first
*  Returns: 0 or EOF on error
*  Notes: different NaN types and negative zero not preserved.
*         if the number is too big to represent it will become infinity
*         if it is too small to represent it will become zero.
*/
int fwriteieee754(double x, FILE *fp, int bigendian)
{
    int shift;
    unsigned long sign, exp, hibits, hilong, lowlong;
    double fnorm, significand;
    int expbits = 11;
    int significandbits = 52;

    /* zero (can't handle signed zero) */
    if (x == 0)
    {
        hilong = 0;
        lowlong = 0;
        goto writedata;
    }
    /* infinity */
    if (x > DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 0;
        goto writedata;
    }
    /* -infinity */
    if (x < -DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (1 << 31);
        lowlong = 0;
        goto writedata;
    }
    /* NaN - dodgy because many compilers optimise out this test, but
    *there is no portable isnan() */
    if (x != x)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 1234;
        goto writedata;
    }

    /* get the sign */
    if (x < 0) { sign = 1; fnorm = -x; }
    else { sign = 0; fnorm = x; }

    /* get the normalized form of f and track the exponent */
    shift = 0;
    while (fnorm >= 2.0) { fnorm /= 2.0; shift++; }
    while (fnorm < 1.0) { fnorm *= 2.0; shift--; }

    /* check for denormalized numbers */
    if (shift < -1022)
    {
        while (shift < -1022) { fnorm /= 2.0; shift++; }
        shift = -1023;
    }
    /* out of range. Set to infinity */
    else if (shift > 1023)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (sign << 31);
        lowlong = 0;
        goto writedata;
    }
    else
        fnorm = fnorm - 1.0; /* take the significant bit off mantissa */

    /* calculate the integer form of the significand */
    /* hold it in a  double for now */

    significand = fnorm * ((1LL << significandbits) + 0.5f);


    /* get the biased exponent */
    exp = shift + ((1 << (expbits - 1)) - 1); /* shift + bias */

    /* put the data into two longs (for convenience) */
    hibits = (long)(significand / 4294967296);
    hilong = (sign << 31) | (exp << (31 - expbits)) | hibits;
    x = significand - hibits * 4294967296;
    lowlong = (unsigned long)(significand - hibits * 4294967296);

writedata:
    /* write the bytes out to the stream */
    if (bigendian)
    {
        fputc((hilong >> 24) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc(hilong & 0xFF, fp);

        fputc((lowlong >> 24) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc(lowlong & 0xFF, fp);
    }
    else
    {
        fputc(lowlong & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 24) & 0xFF, fp);

        fputc(hilong & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 24) & 0xFF, fp);
    }
    return ferror(fp);
}

您可以在标题<float.h>第46页:5.2.4.2.2浮动类型的特征中查看new(C11)和old宏。

In general you should be fine to directly read and write the binary data: the IEEE754 binary interchange format is pretty much standard outside of a few niche areas. 通常,直接读取和写入二进制数据应该没问题:IEEE754二进制交换格式在少数特定领域之外是非常标准的。 You can use the __STDC_IEC_559__ macro to check. 您可以使用__STDC_IEC_559__宏进行检查。

As noted in this question , one thing the spec does not specify is the precise mapping of bits to bytes, so there is potential for endianness issues (though probably not if you're exclusively using x86/x86_64). 如该问题所述,规范未指定的一件事是位到字节的精确映射,因此存在潜在的字节序问题 (尽管如果您仅使用x86 / x86_64,则可能没有)。 It might be a good idea to include a check floating point value at the start of your stream (note that it is not sufficient to check the endianness of your integers, as it is technically possible to have different endianness for integer and floating point). 在流的开头包含一个检查浮点值可能是一个好主意(请注意,检查整数的字节序是不够的,因为从技术上讲,整数和浮点数的字节序可能不同)。

If you're writing text, one alternative to consider is the hex float format , which can be much faster to read/write than decimal formats (though not as fast as the raw binary interchange format). 如果要编写文本,可以考虑的另一种选择是十六进制浮点格式 ,该格式比十进制格式读/写要快得多(尽管不如原始二进制交换格式快)。 Unfortunately, though it is part of both the IEEE and C-99 spec, it has been poorly supported by the MSVC compiler (though this may change now it is part of C++). 不幸的是,尽管它同时是IEEE和C-99规范的一部分,但MSVC编译器对它的支持却很差(尽管现在它可能会改变,它是C ++的一部分)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM