简体   繁体   English

浮点精度中double和float之间的差异

[英]Difference between double and float in floating point accuracy

After reading this question , and this msdn blog , I have tried few examples to test this: 在阅读了这个问题以及这个msdn博客之后 ,我尝试了几个例子来测试这个:

Console.WriteLine(0.8-0.7 == 0.1);

And yes, the expected output is False . 是的,预期的输出是False Hence I try cast the expression in both side to double and float to see whether I can get a different result: 因此我尝试将两侧的表达式转换为doublefloat以查看是否可以得到不同的结果:

Console.WriteLine((float)(0.8-0.7) == (float)(0.1));
Console.WriteLine((double)(0.8-0.7) == (double)(0.1));

The first line output True but the second line output False , why is this happening? 第一行输出True但第二行输出False ,为什么会发生这种情况?

Furthermore, 此外,

Console.WriteLine(8-0.7 == 7.3);
Console.WriteLine(8.0-0.7 == 7.3);

Both of the line above give True even without casting. 即使没有强制转换,上面的两行都给出了True And ... 而......

Console.WriteLine(18.01-0.7 == 17.31);

This line output False . 这行输出False How is subtracting 8 difference from subtracting 18.01 if they both are subtracted by a floating point number? 如果它们都被浮点数减去,如何从减去18.01中减去8差?

I've tried to read through the blog and question, I can't seem to find answer else where. 我试图通过博客和问题阅读,我似乎无法找到答案在哪里。 Can someone please explain to me why are all of these happening in Layman's language? 有人可以向我解释为什么所有这些都发生在Layman的语言中? Thank you in advance. 先感谢您。

EDIT: 编辑:

Console.WriteLine(8.001-0.001 == 8); //this return false
Console.WriteLine(8.01-0.01 == 8); //this return true

Note: I am using .NET fiddle online c# compiler. 注意:我正在使用.NET小提琴在线c#编译器。

The Cases of 0.8−0.7 案例0.8-0.7

In 0.8-0.7 == 0.1 , none of the literals are exactly representable in double . 0.8-0.7 == 0.1 ,没有一个文字在double是完全可表示的。 The nearest representable values are 0.8000000000000000444089209850062616169452667236328125 for .8, 0.6999999999999999555910790149937383830547332763671875 for .7, and 0.1000000000000000055511151231257827021181583404541015625 for .1. 对于.7,最接近的可表示值为0.8000000000000000444089209850062616169452667236328125。对于.7,最接近的可表示值为0.6,对于.7为0.6999999999999999955910799149937383830547332763671875,对于.1为0.1000000000000000055511151231257827021181583404541015625。 When the first two are subtracted, the result is 0.100000000000000088817841970012523233890533447265625. 减去前两个时,结果为0.100000000000000088817841970012523233890533447265625。 As this is not equal to the third, 0.8-0.7 == 0.1 evaluates to false. 由于这不等于第三个, 0.8-0.7 == 0.1评估为假。

In (float)(0.8-0.7) == (float)(0.1) , the result of 0.8-0.7 and 0.1 are each converted to float . (float)(0.8-0.7) == (float)(0.1)0.8-0.70.1的结果各自转换为float The float value nearest to the former, 0.1000000000000000055511151231257827021181583404541015625, is 0.100000001490116119384765625. 最接近前者的float值0.1000000000000000055511151231257827021181583404541015625为0.100000001490116119384765625。 The float value nearest to the latter, 0.100000000000000088817841970012523233890533447265625, is 0.100000001490116119384765625. 最接近后者的float值0.100000000000000088817841970012523233890533447265625为0.100000001490116119384765625。 Since these are the same, (float)(0.8-0.7) == (float)(0.1) evaluates to true. 由于它们是相同的, (float)(0.8-0.7) == (float)(0.1)计算结果为true。

In (double)(0.8-0.7) == (double)(0.1) , the result of 0.8-0.7 and 0.1 are each converted to double . (double)(0.8-0.7) == (double)(0.1)0.8-0.70.1的结果各自转换为double Since they are already double , there is no effect, and the result is the same as for 0.8-0.7 == 0.1 . 因为它们已经double ,所以没有效果,结果与0.8-0.7 == 0.1相同。

Notes 笔记

The C# specification, version 5.0 indicates that float and double are the IEEE-754 32-bit and 64-bit floating-point types. C#规范5.0版表明floatdouble是IEEE-754 32位和64位浮点类型。 I do not see it explicitly state they are the binary floating-point formats rather than decimal formats, but the characteristics described make this evident. 我没有看到它明确表明它们是二进制浮点格式而不是十进制格式,但所描述的特征使这一点变得明显。 The specification also states that IEEE-754 arithmetic is generally used, with round-to-nearest (presumably round-to-nearest-ties-to-even), subject to the exception below. 该规范还规定,通常使用IEEE-754算法,具有舍入到最接近的(可能是从最接近到最近的连接),但下面的例外情况除外。

The C# specification allows floating-point arithmetic to be performed with more precision than the nominal type. C#规范允许以比标称类型更精确的方式执行浮点运算。 Clause 4.1.6 says “… Floating-point operations may be performed with higher precision than the result type of the operation…” This can complicate analysis of floating-point expressions in general, but it does not concern us in the instance of 0.8-0.7 == 0.1 because the only applicable operation is the subtraction of 0.7 from 0.8 , and these numbers are in the same binade (have the same power of two in the floating-point representation), so the result of the subtraction is exactly representable and additional precision will not change the result. 第4.1.6条说“......浮点运算的执行精度可能高于运算的结果类型...”这一般会使浮点表达式的分析复杂化,但在0.8-0.7 == 0.1的情况下它并不关心我们0.8-0.7 == 0.1因为唯一适用的操作是从0.8减去0.7 ,并且这些数字在相同的binade中(在浮点表示中具有相同的2的幂),因此减法的结果是完全可表示的额外的精度不会改变结果。 As long as the conversion of the source texts 0.8 , 0.7 , and 0.1 to double does not use extra precision and the cast to float produces a float with no extra precision, the results will be as stated above. 只要源的转化案文0.80.7 ,以及0.1double不使用额外的精确度和铸造到float产生float ,没有额外的精度,其结果将是如上所述。 (The C# standard says in clause 6.2.1 that a conversion from double to float yields a float value, although it does not explicitly state that no extra precision may be used at this point.) (C#标准在第6.2.1节中说,从doublefloat的转换会产生一个float值,尽管它没有明确声明此时不能使用额外的精度。)

Additional Cases 其他案件

In 8-0.7 == 7.3 , we have 8 for 8 , 7.29999999999999982236431605997495353221893310546875 for 7.3 , 0.6999999999999999555910790149937383830547332763671875 for 0.7 , and 7.29999999999999982236431605997495353221893310546875 for 8-0.7 , so the result is true. 8-0.7 == 7.3 ,我们有8对87.299999999999999822364316059974953532218933105468757.3 ,0.6999999999999999555910790149937383830547332763671875为0.7 ,而对于7.29999999999999982236431605997495353221893310546875 8-0.7 ,所以结果是正确的。

Note that the additional precision allowed by the C# specification could affect the result of 8-0.7 . 请注意,C#规范允许的额外精度可能会影响8-0.7的结果。 AC# implementation that used extra precision for this operation could produce false for this case, as it would get a different result for 8-0.7 . 对于此操作使用额外精度的AC#实现可能会因此而产生错误,因为它会为8-0.7获得不同的结果。

In 18.01-0.7 == 17.31 , we have 18.010000000000001563194018672220408916473388671875 for 18.01 , 0.6999999999999999555910790149937383830547332763671875 for 0.7 , 17.309999999999998721023075631819665431976318359375 for 17.31 , and 17.31000000000000227373675443232059478759765625 for 18.01-0.7 , so the result is false. 18.01-0.7 == 17.31 ,我们有18.010000000000001563194018672220408916473388671875为18.010.69999999999999995559107901499373838305473327636718750.717.30999999999999872102307563181966543197631835937517.31 ,和17.3100000000000022737367544323205947875976562518.01-0.7 ,所以结果是假的。

How is subtracting 8 difference from subtracting 18.01 if they both are subtracted by a floating point number? 如果它们都被浮点数减去,如何从减去18.01中减去8差?

18.01 is larger than 8 and requires a greater power of two in its floating-point representation. 18.01大于8,并且在其浮点表示中需要更大的2的幂。 Similarly, the result of 18.01-0.7 is larger than that of 8-0.7 . 类似地,结果18.01-0.7比大8-0.7 This means the bits in their significands (the fraction portion of the floating-point representation, which is scaled by the power of two) represent greater values, causing the rounding errors in the floating-point operations to be generally greater. 这意味着它们的有效位中的位(浮点表示的小数部分,由2的幂缩放)表示更大的值,导致浮点运算中的舍入误差通常更大。 In general, a floating-point format has a fixed span—there is a fixed distance from the high bit retained to the low bit retained. 通常,浮点格式具有固定的跨度 - 从保持的高位到保留的低位有固定的距离。 When you change to numbers with more bits on the left (high bits), some bits on the right (low bits) are pushed out, and the results change. 当您更改为左侧更多位(高位)的数字时,右侧的某些位(低位)被推出,结果会发生变化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM