Why does returning a floating-point value change its value?

Question

The following code raises the assert on Red Hat 5.4 32 bits but works on Red Hat 5.4 64 bits (or CentOS).

On 32 bits, I must put the return value of millis2seconds in a variable, otherwise the assert is raised, showing that the value of the double returned from the function is different from the one that was passed to it.

If you comment the "#define BUG" line, it works.

Thanks to @R, passing the -msse2 -mfpmath options to the compiler make both variants of the millis2seconds function work.

/*
 * TestDouble.cpp
 */

#include <assert.h>
#include <stdint.h>
#include <stdio.h>

static double millis2seconds(int millis) {
#define BUG
#ifdef BUG
    // following is not working on 32 bits architectures for any values of millis
    // on 64 bits architecture, it works
    return (double)(millis) / 1000.0;
#else
    //  on 32 bits architectures, we must do the operation in 2 steps ?!? ...
    // 1- compute a result in a local variable, and 2- return the local variable
    // why? somebody can explains?
    double result = (double)(millis) / 1000.0;
    return result;
#endif
}

static void testMillis2seconds() {
    int millis = 10;
    double seconds = millis2seconds(millis);

    printf("millis                  : %d\n", millis);
    printf("seconds                 : %f\n", seconds);
    printf("millis2seconds(millis)  : %f\n", millis2seconds(millis));
    printf("seconds <  millis2seconds(millis)  : %d\n", seconds < millis2seconds(millis));
    printf("seconds >  millis2seconds(millis)  : %d\n", seconds > millis2seconds(millis));
    printf("seconds == millis2seconds(millis)  : %d\n", seconds == millis2seconds(millis));

    assert(seconds == millis2seconds(millis));
}

extern int main(int argc, char **argv) {
    testMillis2seconds();
}

Answer 1

With the cdecl calling convention, which is used on Linux x86 systems, a double is returned from a function using the st0 x87 register. All x87 registers are 80-bit precision. With this code:

static double millis2seconds(int millis) {
    return (double)(millis) / 1000.0;
};

The compiler calculates the division using 80-bit precision. When gcc is using the GNU dialect of the standard (which it does by default), it leaves the result in the st0 register, so the full precision is returned back to the caller. The end of the assembly code looks like this:

fdivrp  %st, %st(1)  # Divide st0 by st1 and store the result in st0
leave
ret                  # Return

With this code,

static double millis2seconds(int millis) {
    double result = (double)(millis) / 1000.0;
    return result;
}

the result is stored into a 64-bit memory location, which loses some precision. The 64-bit value is loaded back into the 80-bit st0 register before returning, but the damage is already done:

fdivrp  %st, %st(1)   # Divide st0 by st1 and store the result in st0
fstpl   -8(%ebp)      # Store st0 onto the stack
fldl    -8(%ebp)      # Load st0 back from the stack
leave
ret                   # Return

In your main, the first result is stored in a 64-bit memory location, so the extra precision is lost either way:

double seconds = millis2seconds(millis);

but in the second call, the return value is used directly, so the compiler can keep it in a register:

assert(seconds == millis2seconds(millis));

When using the first version of millis2seconds , you end up comparing the value that has been truncated to 64-bit precision to the value with full 80-bit precision, so there is a difference.

On x86-64, calculations are done using SSE registers, which are only 64-bit, so this issue doesn't come up.

Also, if you use -std=c99 so that you don't get the GNU dialect, the calculated values are stored in memory and re-loaded into the register before returning so as to be standard-conforming.

Answer 2

On i386 (32-bit x86), all floating point expressions are evaluated as an 80-bit IEEE-extended floating point type. This is reflected in FLT_EVAL_METHOD , from float.h, being defined as 2. Storing the result to a variable or applying a cast to the result drops the excess precision via rounding, but that's still not sufficient to guarantee the same result you would see on an implementation (like x86_64) without excess precision, since rounding twice can give different results than performing a computation and rounding in the same step.

One way around this problem is to build using SSE math even on x86 targets, with -msse2 -mfpmath=sse .

Answer 3

It's worth noting first of all that since the function is implicitly pure and called twice with a constant argument the compiler would be within its rights to elide the computation and the comparison altogether.

clang-3.0-6ubuntu3 does eliminate the pure function call with -O9, and does all the floating-point calculations at compile time, so the program succeeds.

The C99 standard, ISO/IEC 9899 , says

The values of floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.

So the compiler is free to pass back an 80-bit value, as others have described. However, the standard goes on to say:

The cast and assignment operators are still required to perform their specified conversions.

This explains why specifically assigning to a double forces the value down to 64-bits and returning as double from a function does not. That is quite surprising it to me.

However, it looks like the C11 standard will actually make this less confusing by adding this text:

If the return expression is evaluated in a floating-point format different from the return type, the expression is converted as if by assignment [which removes any extra range and precision] to the return type of the function and the resulting value is returned to the caller.

So this code is basically exercising unspecified behavior as to whether the value does get truncated or not at various points.

For me, on Ubuntu Precise, with -m32 :

clang passes
clang -O9 also passes
gcc , assertion fails
gcc -O9 passes, because it also is eliminating the constant expressions
gcc -std=c99 fails
gcc -std=c1x also fails (but it may work on a later gcc)
gcc -ffloat-store passes, but seems to have the side-effect of constant elimination

I don't think this is a gcc bug because the standard allows this behavior but the clang behavior is nicer.

Answer 4

In addition to all the details explained in other answers, I would say that there is a very simple rule concerning use of floating point types in almost any programming language since Fortran: never check floating point values for precise equality . All the knowledge about 80-bit and 64-bit values is true, but it is true for a certain hardware and a certain compiler (yes, if you change the compiler or even turn the optimizations on or off, something may change). The more general rule (applicable to any code that is intended to be portable ) is that floating point values generally are not like integers or sequences of bytes, and can be changed, eg when copied, and checking them for equality often has unpredictable results.

So, even if it works in a test, usually it is better not to do so. It may fail later when something changes.

UPD: Though some people have downvoted, I insist the recommendation is generally correct. Things that seem to be just copying a value (they look so from a high level programming language programmer point of view; what happens in the initial example is a typical example, the value is returned and put into a variable and -- voila -- it is changed!), MAY change floating point values. Comparing floating point values for equality or inequality is often a bad practice that may be allowed ONLY if you know why you may do that in your certain case. And writing portable programs usually requires to minimize low-level knowledge. Yes, it is very unlikely that integer values like 0 or 1 are changed when put into a floating point variable or copied. But more complex values (in the example above we see what happens to a result of a simple arithmetic expression!) may.

Why does returning a floating-point value change its value?

Question

4 answers

solution1
36 ACCPTED 2013-06-03 01:04:47

solution2
8 2013-06-03 00:38:25

solution3
3 2013-06-03 00:38:02

solution4
2 2013-06-03 03:08:43

Why does returning a floating-point value change its value?

Question

4 answers

solution1 36 ACCPTED 2013-06-03 01:04:47

solution2 8 2013-06-03 00:38:25

solution3 3 2013-06-03 00:38:02

solution4 2 2013-06-03 03:08:43

solution1
36 ACCPTED 2013-06-03 01:04:47

solution2
8 2013-06-03 00:38:25

solution3
3 2013-06-03 00:38:02

solution4
2 2013-06-03 03:08:43