浮点舍入不正确

Question

On gcc 4.7.3 , my fegetround() function returns FE_TONEAREST . 在gcc 4.7.3 ，我的fegetround()函数返回FE_TONEAREST 。 According to the c++ reference , this means rounding away from zero. 根据c ++参考，这意味着从零开始四舍五入。 Essentially, it means saving the last bit that was shifted out when adjusting the precision of the mantissa after multiplication (since it will be twice as long as it should be). 本质上，它意味着在乘法后调整尾数的精度时保存移出的最后一位（因为它将是它应该的两倍）。 Afterwards, the saved bit is added to the final mantissa result. 然后，将保存的位添加到最终的尾数结果中。

For example, floating point multiplication gives the following results: 例如，浮点乘法给出以下结果：

0x38b7aad5 * 0x38b7aad5 = 0x3203c5af

The mantissa after multiplication is 乘法后的尾数是

  1011 0111 1010 1010 1101 0101
x 1011 0111 1010 1010 1101 0101
-------------------------------
1[000 0011 1100 0101 1010 1110] [1]000 0101 1001 0101 0011 1001

The [23'b] set holds the significant digits, whereas the [1'b] set holds the last bit shifted out. [23'b]组保持有效数字，而[1'b]组保持最后一位移出。 Note that the mantissa for the result is 请注意，结果的尾数是

[000 0011 1100 0101 1010 1111]

The last bit switched to 1 because the [1'b1] set was added to the spliced mantissa (the [23'b] set) due to the rounding mode. 最后一位切换为1因为由于舍入模式， [1'b1]组被添加到拼接的尾数（ [23'b]组）。

Here is an example that is stumping me, because it looks to me like the hardware isn't rounding correctly. 这是一个让我感到难过的例子，因为它让我觉得硬件没有正确地舍入。

0x20922800 * 0x20922800 = 0x1a6e34c (check this on your machine)

  1010 0110 1110 0011 0100 1101
x 1010 0110 1110 0011 0100 1101
-------------------------------
01[01 0011 0111 0001 1010 0110 0][1]00 0000 0000 0000 0000 0000

Final Mantissas:       
Their Result:      01 0011 0111 0001 1010 0110 0
Correct Result(?): 01 0011 0111 0001 1010 0110 1

I've been crunching binary all day, so it's possible I'm missing something simple here. 我整天都在处理二进制文件，所以我可能会错过这里简单的东西。 Which answer is correct with the given rounding mode? 给定的舍入模式，哪个答案是正确的？

Answer 1

When rounding to nearest, IEEE specifies that ties round to even. 当四舍五入到最近时，IEEE指定关系为偶数。 0 is even, 1 is odd, so Intel is correct. 0是偶数， 1是奇数，所以英特尔是正确的。

Answer 2

First rounding to nearest lacks one detail here. 第一轮到最近的地方缺少一个细节。 It is rounding to nearest (even) . 它向四舍五入到最近（偶数） 。

IEEE 754 standard (Section 4.3.1) quote: IEEE 754标准（第4.3.1节）引用：

roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; roundTiesToEven，最接近无限精确结果的浮点数应交付; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered 如果包含无法代表的无限精确结果的两个最近的浮点数同样接近，那么具有偶数最低位的数字应该被传递

In your first example you compute square of (8.75794e-5) which (if represented as 32 bit float) has the following hex pattern: 0x38b7aad5 . 在您的第一个示例中，您计算（8.75794e-5）的平方（如果表示为32位浮点数）具有以下十六进制模式： 0x38b7aad5 。

All 24 significand bits of (8.75794e-5) are: （8.75794e-5）的所有24个有效位是：

0xb7aad5 = 1.0110111_10101010_11010101

Now after squaring that you get: 现在你得到平方后得到：

1.0000011_11000101_10101110_10000101_10010101_00111001

It is noteworthy that in 99% of cases your computations will be performed on FPU (x87 probably) which operates on 80bit floating point format. 值得注意的是，在99％的情况下，您的计算将在FPU（可能是x87）上执行，该FPU以80位浮点格式运行。

Intel® 64 and IA-32 Architectures Software Developer's Manual 英特尔®64和IA-32架构软件开发人员手册

(PROGRAMMING WITH THE X87 FPU): （使用X87 FPU编程）：

When floating-point, integer, or packed BCD integer values are loaded from memory into any of the x87 FPU data registers, the values are automatically converted into double extended-precision floating-point format (if they are not already in that format). 当浮点，整数或压缩BCD整数值从内存加载到任何x87 FPU数据寄存器时，这些值将自动转换为双扩展精度浮点格式（如果它们尚未采用该格式）。

Now you want to store your result in 32 bit float: 现在，您希望将结果存储在32位浮点数中：

1.[0000011_11000101_10101110]10000101_10010101_00111001

and here is where rounding modes matter. 这里是舍入模式很重要的地方。 IEEE 754 defines 4 of them but let's focus on the default one (rounding to nearest (even)) as we discuss this one here. IEEE 754定义了其中的4个，但让我们关注默认的（舍入到最近（偶数）），我们在这里讨论这个。

Now that your FPU has the result (the whole - we have 64 significand bits in 80bit format) computed it must perform rounding to fit the number within 32 bits (24significand bits). 既然你的FPU有结果（整个 - 我们有80位格式的64个有效位），它必须执行舍入以适应32位（24个有效位和24位）内的数字。 All 23 bits that would need to be explicitly stored are placed within brackets above. 需要显式存储的所有23位都放在上面的括号内。

Now rounding to nearest has nothing to do with even word in this particular case since bits on the right of the bracket are not halfway between: 现在四舍五入到最接近无关甚至字在这个特定的情况下 ，由于在支架的右位不能半途而废之间：

1.[0000011_11000101_10101111]
and
1.[0000011_11000101_10101110]

but they are nearer to 但他们离我们更近了

1.[0000011_11000101_10101111]

This is why your result's significand is 0x3203C5AF . 这就是你的结果有效位为0x3203C5AF 。

Now problematic result of squaring 2.4759832E-19 0x20922800 . 现在有问题的结果是平方2.4759832E-19 0x20922800 。

24 significand bits of 2.4759832E-19 are : 2.4759832E-19的24个有效位是：

0x922800 = 1.0010010_00101000_0000_0000

and squared: 和平方：

1.[0100110_11100011_01001100]10000000_00000000_0000000

And here is where even part really matters. 这里甚至是真正重要的部分。 Now your value lies exactly halfway between: 现在你的价值正介于两者之间：

1.[0100110_11100011_01001101]
and
1.[0100110_11100011_01001100]

Above 2 values are said to bracket your value. 据说超过2个值可以包含您的价值。 From them you now need to choose even one (the latter since lsb=0). 从他们那里你现在需要选择一个（后者因为lsb = 0）。

Now you know why 24bits of your result are 0xA6E34C ( nearest even ) and not 0xA6E34D (nearest but odd) 现在你知道为什么结果的24位是0xA6E34C （ 最近的偶数 ）而不是0xA6E34D （最接近奇数）

浮点舍入不正确

问题描述

2 个解决方案

解决方案1
6 已采纳 2013-12-06 23:48:25

解决方案2
2 2013-12-10 21:13:01

浮点舍入不正确

问题描述

2 个解决方案

解决方案1 6 已采纳 2013-12-06 23:48:25

解决方案2 2 2013-12-10 21:13:01

解决方案1
6 已采纳 2013-12-06 23:48:25

解决方案2
2 2013-12-10 21:13:01