浮点最大值

Question

For the following loop, I was expecting output to be sum = 20e6 but the output is sum = 1.67772e+07 . 对于以下循环，我期望输出为sum = 20e6但输出为sum = 1.67772e+07 。

float sum=0.0f;
for(i=0;i<20e6;i++)
    sum = sum + 1.0f;
printf("sum = %g\n", sum);

Question 1: Why sum being float cannot hold values greater than 1.67772e07 ? 问题1：为什么sum为浮点数不能包含大于1.67772e07值？

Question 2: If I change the statement in the loop to sum = sum + 1.001f; 问题2：如果我将循环中的语句更改为sum = sum + 1.001f; then final value of sum is 2.32229e+07 . 那么sum的最终值为2.32229e+07 。 Why is this difference in the value of sum? 为什么总和值有这种差异？

Question 3: Can we control this behaviour in the loop above such that we can use float for values bigger than 1.67772e07 while still incrementing by 1.0f? 问题3：我们可以在上面的循环中控制此行为，以便可以将float用于大于1.67772e07值，同时仍以1.0f递增吗？

Answer 1

At some point, the closest representable value to x + 1.0f is x itself. 在某个时候，最接近x + 1.0f可表示值是x本身。 After that point is reached, due to this rounding error your loop won't cause any further increase in sum . 达到该点之后，由于此舍入错误，您的循环将不会导致sum任何进一步增加。

As an illustration, you can observe this effect using scientific notation with a fixed number of significant figures. 作为说明，您可以使用科学计数法使用固定数量的有效数字来观察这种效果。 For example, with 4 significant figures: 例如，有4个有效数字：

    0 = 0.000e0
    1 = 1.000e0
    2 = 2.000e0
    3 = 3.000e0

... ...

    9 = 9.000e0
   10 = 1.000e1
   11 = 1.100e1

... ...

  99 = 9.900e1
 100 = 1.000e2
 101 = 1.010e2

... ...

  999 = 9.990e2
 1000 = 1.000e3
 1001 = 1.001e3

... ...

 9999 = 9.999e3
10000 = 1.000e4

and if you add one more, you should get 1.0001e4 , but since only 4 significant digits are preserved, the stored value is 1.000e4 , eg 10000 + 1 = 10000 in this system, and continuing to increment just repeats this calculation forever without changing the result. 如果再加上一个，应该得到1.0001e4 ，但是由于只保留了4个有效数字，因此存储的值是1.000e4 ，例如在该系统中为10000 + 1 = 10000，并且继续递增只会永远重复此计算而无需更改结果。

Your code works exactly the same way, except that float uses binary floating-point, not decimals as scientific notation does. 您的代码工作方式完全相同，除了float使用二进制浮点，而不是科学计数法使用小数。 But the number of significant binary digits is still limited, and when adding one more doesn't change one of those significant digits, sum ceases to increase. 但是有效的二进制数字的数量仍然是有限的，并且再加上一个不改变这些有效数字之一时， sum停止增加。

It's somewhat more complicated, because with binary, the "correct" result is midway between two representable numbers, so rounding could either occur downward or upward , in which case you asked to add 1 but actually get a result 2 higher. 它有点复杂，因为使用二进制时，“正确”的结果介于两个可表示的数字之间，因此舍入可以向下或向上进行 ，在这种情况下，您要求加1，但实际上得到的结果高2。 In any case, once the distance between representable values becomes 4, trying to add one will have no effect. 在任何情况下，一旦可表示的值之间的距离变为4，则尝试将其相加将无效。

浮点最大值

问题描述

1 个解决方案

解决方案1
8 已采纳 2015-08-28 16:37:17

浮点最大值

问题描述

1 个解决方案

解决方案1 8 已采纳 2015-08-28 16:37:17

解决方案1
8 已采纳 2015-08-28 16:37:17