为什么 IEEE754 单精度浮点数只有 7 位精度？

Question

Why does a single-precision floating point number have 7 digit precision (or double 15-16 digits precision)?为什么单精度浮点数有 7 位精度（或双精度 15-16 位精度）？

Can anyone please explain how we arrive on that based on the 32 bits assigned for float(Sign(32) Exponent(30-23), Fraction (22-0))?任何人都可以解释我们如何根据分配给 float(Sign(32) Exponent(30-23), Fraction (22-0)) 的 32 位得出这个结论？

Answer 1

23 fraction bits (22-0) of the significand appear in the memory format but the total precision is actually 24 bits since we assume there is a leading 1. This is equivalent to log10(2^24) ≈ 7.225 decimal digits.有效数的 23 个小数位 (22-0) 出现在内存格式中，但总精度实际上是 24 位，因为我们假设有一个前导 1。这相当于log10(2^24) ≈ 7.225十进制数字。

Double-precision float has 52 bits in fraction, plus the leading 1 is 53. Therefore a double can hold log10(2^53) ≈ 15.955 decimal digits, not quite 16.双精度浮点数有 52 位小数，加上前导 1 是 53。因此双精度可以容纳log10(2^53) ≈ 15.955十进制数字，而不是 16。

Note: The leading 1 is not a sign bit.注意：前导 1 不是符号位。 It is actually (-1)^sign * 1.ffffffff * 2^(eeee-constant) but we need not store the leading 1 in the fraction.它实际上是(-1)^sign * 1.ffffffff * 2^(eeee-constant)但我们不需要在分数中存储前导 1。 The sign bit must still be stored符号位仍必须存储

There are some numbers that cannot be represented as a sum of powers of 2, such as 1/9:有些数字不能表示为 2 的幂之和，例如 1/9：

>>>> double d = 0.111111111111111;
>>>> System.out.println(d + "\n" + d*10);
0.111111111111111
1.1111111111111098

If a financial program were to do this calculation over and over without self-correcting, there would eventually be discrepancies.如果财务程序在没有自我更正的情况下一遍又一遍地进行这种计算，最终会出现差异。

>>>> double d = 0.111111111111111;
>>>> double sum = 0;
>>>> for(int i=0; i<1000000000; i++) {sum+=d;}
>>>> System.out.println(sum);
111111108.91914201

After 1 billion summations, we are missing over $2.经过 10 亿次求和，我们损失了超过 2 美元。

Answer 2

32 float has 23 bit，so the smallest unit is 32个浮点数有23位，所以最小单位是

2^(-23) = 0.00000011920928955078125

The other numbers are only greater than 0.00000011920928955078125.It's not impossible less than 0.00000011920928955078125.And other numbers is consist of 0.00000011920928955078125其他数只大于0.00000011920928955078125。小于0.00000011920928955078125也不是不可能。其他数是0.0000001192092895507812

0.00000011920928955078125 * n

So we can express 0.00000x[1-9] easily.And float32 can has 6 digit precision certainly.Don't think about roundoff, we can calculate 7 digit number as bellow:所以我们可以很容易地表达0.00000x[1-9]。而float32当然可以有6位精度。不要考虑四舍五入，我们可以计算出7位数字如下：

0.00000011920928955078125 * 1 = 0.0000001
0.00000011920928955078125 * 2 = 0.0000002
0.00000011920928955078125 * 3 = 0.0000003
0.00000011920928955078125 * 4 = 0.0000004
0.00000011920928955078125 * 5 = 0.0000005
0.00000011920928955078125 * 6 = 0.0000007
0.00000011920928955078125 * 7 = 0.0000008
0.00000011920928955078125 * 8 = 0.0000009
0.00000011920928955078125 * 9 = 0.000001

It can't express 0.0000006.This is the result float32 has 6~7 digit precision which we can find in the internet everywhere.它不能表达0.0000006。这是float32具有6~7位精度的结果，我们在互联网上随处可见。

为什么 IEEE754 单精度浮点数只有 7 位精度？

问题描述

2 个解决方案

解决方案1
9 已采纳 2013-10-02 05:20:00

解决方案2
0 2019-11-12 08:12:11

为什么 IEEE754 单精度浮点数只有 7 位精度？

问题描述

2 个解决方案

解决方案1 9 已采纳 2013-10-02 05:20:00

解决方案2 0 2019-11-12 08:12:11

解决方案1
9 已采纳 2013-10-02 05:20:00

解决方案2
0 2019-11-12 08:12:11