浮点舍入

Question

I'm writing code in C++ and have defined PI as: 我正在用C ++编写代码，并将PI定义为：

const double MathConstants::PI = atan(1.0)*4.0;

Else where in my code I set a float variable: 在我的代码的其他地方，我设置了一个float变量：

float result = (float) (-MathConstants::PI / 2.0);

When I print the value of result in memory I get DB 0F C9 BF little endian. 当我将结果的值打印到内存中时，得到DB 0F C9 BF little endian。 Converted to big endian that is BF C9 0F DB. 转换为BF C9 0F DB的大字节序。

According to http://babbage.cs.qc.cuny.edu/IEEE-754.old/Decimal.html that is equivalent to -1.5707964. 根据http://babbage.cs.qc.cuny.edu/IEEE-754.old/Decimal.html ，它等效于-1.5707964。

What I don't understand is why I get -1.5707964 instead of -1.5707963. 我不明白的是为什么我得到-1.5707964而不是-1.5707963。 -1.5707963 is the result I would expect since -PI/2 is -1.5707963267948966. 我期望-1.5707963的结果，因为-PI / 2是-1.5707963267948966。

Can someone enlighten me here? 有人可以在这里启发我吗？

Answer 1

The rounding is binary, not decimal. 四舍五入是二进制而不是十进制。 When you look at the result in decimal it seems wrong, because you expect it to round the final decimal digit, but that's not what it's doing. 当您以十进制形式查看结果时，这似乎是错误的，因为您希望它舍入最后的十进制数字，但这不是它的作用。 It's rounding the final binary bit. 四舍五入最后的二进制位。

I typed 1.57079632679489661923132169163975 as the constant for PI/2 into the link you provided, and it gave the proper output of 1.5707964. 我在您提供的链接中输入了1.57079632679489661923132169163975作为PI / 2的常数，它给出了1.5707964的正确输出。

Compare the 64-bit representation to the 32-bit representation: 将64位表示形式与32位表示形式进行比较：

1 .1001001000011111101101010100010001000010110100011000
1 .10010010000111111011011

The part that was cut off started with 1, so proper rounding required the 32-bit result to be bumped up. 截断的部分从1开始，因此正确的舍入要求增加32位结果。

浮点舍入

问题描述

1 个解决方案

解决方案1
4 已采纳 2012-03-21 19:51:45

浮点舍入

问题描述

1 个解决方案

解决方案1 4 已采纳 2012-03-21 19:51:45

解决方案1
4 已采纳 2012-03-21 19:51:45