为什么从 32 位二进制大端编码的文件中读取 IEEE-754 浮点时会丢失精度？

Question

I am re-writing some Matlab file processing code in pure C, and I've implemented the following function that will read 4 bytes from a binary big endian encoded file that should be representing an ieee-754 single precision floating point value.我正在用纯 C 重写一些 Matlab 文件处理代码，并且我已经实现了以下 function，它将从二进制文件中读取 4 个字节的编码精度浮点值 7 I verified that I'm able to pull the relevent 32-bit data out of the file as an unsigned integer with the following code.我验证我能够使用以下代码将相关的 32 位数据作为无符号 integer 从文件中提取出来。

int fread_uint32_be(uint32_t *result, FILE ** fp)
{
    uint8_t data[sizeof(uint32_t)];
    if (!result || !*fp || sizeof(uint32_t) != fread((void *) data, 1, sizeof(uint32_t), *fp))
    {
        return -1;
    }
    *result = ((uint32_t)(data[0]) << 24 | (uint32_t)(data[1]) << 16 |
               (uint32_t)(data[2]) << 8  | (uint32_t)(data[3]));
    return 0;
}

The data I'm expecting has a hex value of 0x1acba506 as returned from this function, and was verified by a hex dump of the data file in big endian format.我期待的数据有一个 0x1acba506 的十六进制值，从这个0x1acba506返回，并通过大端格式的数据文件的十六进制转储进行验证。 Now here comes my problem...现在我的问题来了...

When I cast this value from uint32_t to float , I get a single precision floating point value of 449553664.000000 which is close but not exactly what the Matlab code had, which was 449553670.000000 .当我将此值从uint32_t转换为float时，我得到一个单精度浮点值449553664.000000 ，它接近但不完全是 Matlab 代码所具有的值，即449553670.000000 。 I've verified that when Matlab reads the binary file, it also gets the same hex value 0x1acba506 that my C code has.我已经验证，当 Matlab 读取二进制文件时，它也会获得与我的 C 代码相同的十六进制值0x1acba506 。

When I cast back from float to uint32_t and print the hex value, I end up with 0x1acba500 , which shows that I'm losing precision in the simple cast ie float ans = (float)result;当我从float转换回uint32_t并打印十六进制值时，我最终得到0x1acba500 ，这表明我在简单转换中失去了精度，即float ans = (float)result; but I don't really understand why?但我真的不明白为什么？ I'm using gcc 7.4 on an x86 machine, and I've verified that sizeof float == sizeof uint32 .我在 x86 机器上使用 gcc 7.4，并且我已经验证了sizeof float == sizeof uint32 。 Am I making a poor assumption that the compiler is using IEEE-754 single precision floating point?我是否假设编译器正在使用 IEEE-754 单精度浮点？

In debugging, I found an online calculator for floating point that makes it seem like the precision is hopelessly lost, but then the question becomes how is Matlab retaining it?在调试的时候，我发现了一个在线的浮点计算器，它看起来精度已经无可救药了，但问题是Matlab是如何保留它的？

Answer 1

A single-precision floating point number fits in a 32-bit register, which is exactly the same size as a 32-bit integer.单精度浮点数适合 32 位寄存器，其大小与 32 位 integer 完全相同。 But not all of the floating point number is precision: some of it (8 bits, as it happens) is used to represent the exponent.但并非所有浮点数都是精确的：其中一些（碰巧是 8 位）用于表示指数。 So that means that a single-precision floating point number cannot represent the same amount of precision as a 32-bit integer.这意味着单精度浮点数不能表示与 32 位 integer 相同的精度。

Thus, when you convert a 32-bit integer to single-precision floating point, some loss of precision is to be expected.因此，当您将 32 位 integer 转换为单精度浮点时，会出现一些精度损失。 If you want to not lose precision, you should use the more common double-precision floating point format, which uses 64 bits, including 53 bits of precision.如果不想丢失精度，应该使用更常见的双精度浮点格式，它使用 64 位，包括 53 位精度。

Answer 2

The mantissa of IEEE 754 single-precision float is 24 bits, where the first bit is implied 1. IEEE 754单精度浮点数的尾数为 24 位，其中第一位隐含 1。

Let's see your two integers - Python is a good tool for debugging these.让我们看看你的两个整数 - Python 是调试这些整数的好工具。 Their bit representations are他们的位表示是

>>> format(449553664, '032b')
'00011010110010111010010100000000'

and和

>>> format(449553670, '032b')
'00011010110010111010010100000110'

Now, if we look into the latter number and see how it would fit into a single-precision mantissa, the first 1 bit is the 4th bit from left, and including that we count 24 bits, and we get现在，如果我们查看后一个数字，看看它如何适合单精度尾数，第一个 1 位是左起第 4 位，包括我们计算 24 位，我们得到

>>> format(449553670, '032b').lstrip('0')[:24]
'110101100101110100101000'

Clearly the last 110 did not fit into the mantissa and the value was rounded down.很明显，最后的110不适合尾数，因此该值向下舍入。 Therefore the value of (float)449553670 is presented as因此(float)449553670的值表示为

1.10101100101110100101000b * 10b ^ 11100b

ie in decimal即十进制

1.67471790313720703125 * 2 ^ 28

which equals 449553664.0.等于 449553664.0。

Matlab most likely retains the precision by not using floats but doubles, just like JavaScript does. Matlab 很可能通过不使用浮点数而是双精度数来保持精度，就像 JavaScript 一样。 All integers with width less than 53 bits can be represented in IEEE 754 double-precision floats .所有宽度小于 53 位的整数都可以用 IEEE 754双精度浮点数表示。

为什么从 32 位二进制大端编码的文件中读取 IEEE-754 浮点时会丢失精度？

问题描述

2 个解决方案

解决方案1
2 2020-06-07 04:06:11

解决方案2
2 已采纳 2020-06-07 05:29:02

为什么从 32 位二进制大端编码的文件中读取 IEEE-754 浮点时会丢失精度？

问题描述

2 个解决方案

解决方案1 2 2020-06-07 04:06:11

解决方案2 2 已采纳 2020-06-07 05:29:02

解决方案1
2 2020-06-07 04:06:11

解决方案2
2 已采纳 2020-06-07 05:29:02