Difference between double and float in floating point accuracy

Question

After reading this question , and this msdn blog , I have tried few examples to test this:

Console.WriteLine(0.8-0.7 == 0.1);

And yes, the expected output is False . Hence I try cast the expression in both side to double and float to see whether I can get a different result:

Console.WriteLine((float)(0.8-0.7) == (float)(0.1));
Console.WriteLine((double)(0.8-0.7) == (double)(0.1));

The first line output True but the second line output False , why is this happening?

Furthermore,

Console.WriteLine(8-0.7 == 7.3);
Console.WriteLine(8.0-0.7 == 7.3);

Both of the line above give True even without casting. And ...

Console.WriteLine(18.01-0.7 == 17.31);

This line output False . How is subtracting 8 difference from subtracting 18.01 if they both are subtracted by a floating point number?

I've tried to read through the blog and question, I can't seem to find answer else where. Can someone please explain to me why are all of these happening in Layman's language? Thank you in advance.

EDIT:

Console.WriteLine(8.001-0.001 == 8); //this return false
Console.WriteLine(8.01-0.01 == 8); //this return true

Note: I am using .NET fiddle online c# compiler.

Answer 1

The Cases of 0.8−0.7

In 0.8-0.7 == 0.1 , none of the literals are exactly representable in double . The nearest representable values are 0.8000000000000000444089209850062616169452667236328125 for .8, 0.6999999999999999555910790149937383830547332763671875 for .7, and 0.1000000000000000055511151231257827021181583404541015625 for .1. When the first two are subtracted, the result is 0.100000000000000088817841970012523233890533447265625. As this is not equal to the third, 0.8-0.7 == 0.1 evaluates to false.

In (float)(0.8-0.7) == (float)(0.1) , the result of 0.8-0.7 and 0.1 are each converted to float . The float value nearest to the former, 0.1000000000000000055511151231257827021181583404541015625, is 0.100000001490116119384765625. The float value nearest to the latter, 0.100000000000000088817841970012523233890533447265625, is 0.100000001490116119384765625. Since these are the same, (float)(0.8-0.7) == (float)(0.1) evaluates to true.

In (double)(0.8-0.7) == (double)(0.1) , the result of 0.8-0.7 and 0.1 are each converted to double . Since they are already double , there is no effect, and the result is the same as for 0.8-0.7 == 0.1 .

Notes

The C# specification, version 5.0 indicates that float and double are the IEEE-754 32-bit and 64-bit floating-point types. I do not see it explicitly state they are the binary floating-point formats rather than decimal formats, but the characteristics described make this evident. The specification also states that IEEE-754 arithmetic is generally used, with round-to-nearest (presumably round-to-nearest-ties-to-even), subject to the exception below.

The C# specification allows floating-point arithmetic to be performed with more precision than the nominal type. Clause 4.1.6 says “… Floating-point operations may be performed with higher precision than the result type of the operation…” This can complicate analysis of floating-point expressions in general, but it does not concern us in the instance of 0.8-0.7 == 0.1 because the only applicable operation is the subtraction of 0.7 from 0.8 , and these numbers are in the same binade (have the same power of two in the floating-point representation), so the result of the subtraction is exactly representable and additional precision will not change the result. As long as the conversion of the source texts 0.8 , 0.7 , and 0.1 to double does not use extra precision and the cast to float produces a float with no extra precision, the results will be as stated above. (The C# standard says in clause 6.2.1 that a conversion from double to float yields a float value, although it does not explicitly state that no extra precision may be used at this point.)

Additional Cases

In 8-0.7 == 7.3 , we have 8 for 8 , 7.29999999999999982236431605997495353221893310546875 for 7.3 , 0.6999999999999999555910790149937383830547332763671875 for 0.7 , and 7.29999999999999982236431605997495353221893310546875 for 8-0.7 , so the result is true.

Note that the additional precision allowed by the C# specification could affect the result of 8-0.7 . AC# implementation that used extra precision for this operation could produce false for this case, as it would get a different result for 8-0.7 .

In 18.01-0.7 == 17.31 , we have 18.010000000000001563194018672220408916473388671875 for 18.01 , 0.6999999999999999555910790149937383830547332763671875 for 0.7 , 17.309999999999998721023075631819665431976318359375 for 17.31 , and 17.31000000000000227373675443232059478759765625 for 18.01-0.7 , so the result is false.

How is subtracting 8 difference from subtracting 18.01 if they both are subtracted by a floating point number?

18.01 is larger than 8 and requires a greater power of two in its floating-point representation. Similarly, the result of 18.01-0.7 is larger than that of 8-0.7 . This means the bits in their significands (the fraction portion of the floating-point representation, which is scaled by the power of two) represent greater values, causing the rounding errors in the floating-point operations to be generally greater. In general, a floating-point format has a fixed span—there is a fixed distance from the high bit retained to the low bit retained. When you change to numbers with more bits on the left (high bits), some bits on the right (low bits) are pushed out, and the results change.

Difference between double and float in floating point accuracy

Question

1 answers

solution1
5 ACCPTED 2019-06-25 11:48:05

The Cases of 0.8−0.7

Notes

Additional Cases

Difference between double and float in floating point accuracy

Question

1 answers

solution1 5 ACCPTED 2019-06-25 11:48:05

The Cases of 0.8−0.7

Notes

Additional Cases

solution1
5 ACCPTED 2019-06-25 11:48:05