Dodging the inaccuracy of a floating point number

Question

I totally understand the problems associated with floating points, but I have seen a very interesting behavior that I can't explain.

float x = 1028.25478;
long int y = 102825478;
float z = y/(float)100000.0;
printf("x = %f ", x);
printf("z = %f",z);

The output is:

x = 1028.254761 z = 1028.254780

Now if floating numbers failed to represent that specific random value (1028.25478) when I assigned that to variable x. Why isn't it the same in case of variable z?

PS I'm using pellesC IDE to test the code (C11 compiler).

Answer 1

I am pretty sure that what happens here is that the latter floating point variable is elided and instead kept in a double-precision register; and then passed as is as an argument to printf . Then the compiler will believe that it is safe to pass this number at double precision after default argument promotions.

I managed to produce a similar result using GCC 7.2.0, with these switches:

-Wall -Werror -ffast-math -m32 -funsafe-math-optimizations -fexcess-precision=fast -O3

The output is

x = 1028.254761 z = 1028.254800

The number is slightly different there^.

The description for -fexcess-precision=fast says:

-fexcess-precision=style

This option allows further control over excess precision on machines where floating-point operations occur in a format with more precision or range than the IEEE standard and interchange floating-point types. By default, -fexcess-precision=fast is in effect; this means that operations may be carried out in a wider precision than the types specified in the source if that would result in faster code, and it is unpredictable when rounding to the types specified in the source code takes place. When compiling C, if -fexcess-precision=standard is specified then excess precision follows the rules specified in ISO C99; in particular, both casts and assignments cause values to be rounded to their semantic types (whereas -ffloat-store only affects assignments). This option [ -fexcess-precision=standard ] is enabled by default for C if a strict conformance option such as -std=c99 is used. -ffast-math enables -fexcess-precision=fast by default regardless of whether a strict conformance option is used.

This behaviour isn't C11-compliant

Answer 2

Restricting this to IEEE754 strict floating point, the answers should be the same.

1028.25478 is actually 1028.2547607421875 . That accounts for x .

In the evaluation of y / (float)100000.0; , y is converted to a float , by C's rules of argument promotion. The closest float to 102825478 is 102825480 . IEEE754 requires the returning of the the best result of a division, which should be 1028.2547607421875 (the value of z ): the closest number to 1028.25480 .

So my answer is at odds with your observed behaviour. I put that down to your compiler not implementing floating point strictly; or perhaps not implementing IEEE754.

Answer 3

Code acts as if z was a double and y/(float)100000.0 is y/100000.0 .

float x = 1028.25478;
long int y = 102825478;
double z = y/100000.0;

// output
x = 1028.254761 z = 1028.254780

An important consideration is FLT_EVAL_METHOD . This allows select floating point code to evaluate at higher precision.

#include <float.h>
#include <stdio.h>
printf("FLT_EVAL_METHOD %d\n", FLT_EVAL_METHOD);

Except for assignment and cast ..., the values yielded by operators with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. The use of evaluation formats is characterized by the implementation-defined value of FLT_EVAL_METHOD .

-1 indeterminable;
0 evaluate all operations and constants just to the range and precision of the type;
1 evaluate ... type float and double to the range and precision of the double type, evaluate long double ... to the range and precision of the long double type;
2 evaluate all ... to the range and precision of the long double type.

Yet this does not apply as z with float z = y/(float)100000.0; should lose all higher precision on the assignment.

I agree with @Antti Haapala that code is using a speed optimization that has less adherence to the expected rules of floating point math.

Dodging the inaccuracy of a floating point number

Question

3 answers

solution1
5 ACCPTED 2017-12-21 15:23:07

solution2
2 2017-12-21 15:26:46

solution3
2 2017-12-21 16:14:50

Dodging the inaccuracy of a floating point number

Question

3 answers

solution1 5 ACCPTED 2017-12-21 15:23:07

solution2 2 2017-12-21 15:26:46

solution3 2 2017-12-21 16:14:50

solution1
5 ACCPTED 2017-12-21 15:23:07

solution2
2 2017-12-21 15:26:46

solution3
2 2017-12-21 16:14:50