为什么返回浮点值会更改其值？

Question

The following code raises the assert on Red Hat 5.4 32 bits but works on Red Hat 5.4 64 bits (or CentOS). 以下代码在Red Hat 5.4 32位上引发assert ，但在Red Hat 5.4 64位（或CentOS）上工作。

On 32 bits, I must put the return value of millis2seconds in a variable, otherwise the assert is raised, showing that the value of the double returned from the function is different from the one that was passed to it. 在32位上，我必须将返回值millis2seconds放入变量中，否则将引发assert ，这表明从函数返回的double值与传递给它的值不同。

If you comment the "#define BUG" line, it works. 如果您在“ #define BUG”行中添加注释，它将起作用。

Thanks to @R, passing the -msse2 -mfpmath options to the compiler make both variants of the millis2seconds function work. 感谢@R，将-msse2 -mfpmath选项传递给编译器使millis2seconds函数的两个变体都可以工作。

/*
 * TestDouble.cpp
 */

#include <assert.h>
#include <stdint.h>
#include <stdio.h>

static double millis2seconds(int millis) {
#define BUG
#ifdef BUG
    // following is not working on 32 bits architectures for any values of millis
    // on 64 bits architecture, it works
    return (double)(millis) / 1000.0;
#else
    //  on 32 bits architectures, we must do the operation in 2 steps ?!? ...
    // 1- compute a result in a local variable, and 2- return the local variable
    // why? somebody can explains?
    double result = (double)(millis) / 1000.0;
    return result;
#endif
}

static void testMillis2seconds() {
    int millis = 10;
    double seconds = millis2seconds(millis);

    printf("millis                  : %d\n", millis);
    printf("seconds                 : %f\n", seconds);
    printf("millis2seconds(millis)  : %f\n", millis2seconds(millis));
    printf("seconds <  millis2seconds(millis)  : %d\n", seconds < millis2seconds(millis));
    printf("seconds >  millis2seconds(millis)  : %d\n", seconds > millis2seconds(millis));
    printf("seconds == millis2seconds(millis)  : %d\n", seconds == millis2seconds(millis));

    assert(seconds == millis2seconds(millis));
}

extern int main(int argc, char **argv) {
    testMillis2seconds();
}

Answer 1

With the cdecl calling convention, which is used on Linux x86 systems, a double is returned from a function using the st0 x87 register. 使用Linux x86系统上使用的cdecl调用约定，使用st0 x87寄存器的函数将返回一个double。 All x87 registers are 80-bit precision. 所有x87寄存器均为80位精度。 With this code: 使用此代码：

static double millis2seconds(int millis) {
    return (double)(millis) / 1000.0;
};

The compiler calculates the division using 80-bit precision. 编译器使用80位精度计算除法。 When gcc is using the GNU dialect of the standard (which it does by default), it leaves the result in the st0 register, so the full precision is returned back to the caller. 当gcc使用标准的GNU方言（默认情况下会执行该操作）时，它将结果保留在st0寄存器中，因此会将全精度返回给调用方。 The end of the assembly code looks like this: 汇编代码的末尾如下所示：

fdivrp  %st, %st(1)  # Divide st0 by st1 and store the result in st0
leave
ret                  # Return

With this code, 有了这段代码，

static double millis2seconds(int millis) {
    double result = (double)(millis) / 1000.0;
    return result;
}

the result is stored into a 64-bit memory location, which loses some precision. 结果存储到64位存储位置，这会降低精度。 The 64-bit value is loaded back into the 80-bit st0 register before returning, but the damage is already done: 在返回之前，将64位值重新加载到80位st0寄存器中，但是损坏已经完成：

fdivrp  %st, %st(1)   # Divide st0 by st1 and store the result in st0
fstpl   -8(%ebp)      # Store st0 onto the stack
fldl    -8(%ebp)      # Load st0 back from the stack
leave
ret                   # Return

In your main, the first result is stored in a 64-bit memory location, so the extra precision is lost either way: 在您的主机中，第一个结果存储在64位内存位置中，因此，两种方式都会失去额外的精度：

double seconds = millis2seconds(millis);

but in the second call, the return value is used directly, so the compiler can keep it in a register: 但是在第二次调用中，直接使用返回值，因此编译器可以将其保存在寄存器中：

assert(seconds == millis2seconds(millis));

When using the first version of millis2seconds , you end up comparing the value that has been truncated to 64-bit precision to the value with full 80-bit precision, so there is a difference. 当使用第一个版本的millis2seconds ，您最终将已被截断为64位精度的值与具有完整80位精度的值进行比较，因此存在差异。

On x86-64, calculations are done using SSE registers, which are only 64-bit, so this issue doesn't come up. 在x86-64上，使用SSE寄存器（只有64位）完成计算，因此不会出现此问题。

Also, if you use -std=c99 so that you don't get the GNU dialect, the calculated values are stored in memory and re-loaded into the register before returning so as to be standard-conforming. 另外，如果使用-std=c99以便不获取GNU方言，则计算所得的值将存储在内存中，并在返回之前重新加载到寄存器中，以使其符合标准。

Answer 2

On i386 (32-bit x86), all floating point expressions are evaluated as an 80-bit IEEE-extended floating point type. 在i386（32位x86）上，所有浮点表达式都被评估为80位IEEE扩展的浮点类型。 This is reflected in FLT_EVAL_METHOD , from float.h, being defined as 2. Storing the result to a variable or applying a cast to the result drops the excess precision via rounding, but that's still not sufficient to guarantee the same result you would see on an implementation (like x86_64) without excess precision, since rounding twice can give different results than performing a computation and rounding in the same step. 这反映在float.h的FLT_EVAL_METHOD ，定义为2。将结果存储到变量或对结果进行FLT_EVAL_METHOD会通过舍入降低过多的精度，但是仍然不足以保证您将看到的结果相同。一个没有过多精度的实现（例如x86_64），因为与在同一步骤中执行计算和舍入相比，两次舍入可以得出不同的结果。

One way around this problem is to build using SSE math even on x86 targets, with -msse2 -mfpmath=sse . 解决此问题的一种方法是甚至在x86目标上也使用-msse2 -mfpmath=sse来构建SSE数学。

Answer 3

It's worth noting first of all that since the function is implicitly pure and called twice with a constant argument the compiler would be within its rights to elide the computation and the comparison altogether. 首先值得注意的是，由于该函数是隐式的纯函数，并使用一个常量参数对其进行了两次调用，因此编译器将有权完全取消计算和比较。

clang-3.0-6ubuntu3 does eliminate the pure function call with -O9, and does all the floating-point calculations at compile time, so the program succeeds. clang-3.0-6ubuntu3确实使用-O9消除了纯函数调用，并且在编译时执行了所有浮点计算，因此程序成功了。

The C99 standard, ISO/IEC 9899 , says C99标准ISO / IEC 9899表示

The values of floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; 浮点操作数的值和浮点表达式的结果可以比类型所需的精度和范围大。 the types are not changed thereby. 类型不会因此改变。

So the compiler is free to pass back an 80-bit value, as others have described. 因此，正如其他人所描述的，编译器可以自由地传回80位值。 However, the standard goes on to say: 但是，该标准继续说：

The cast and assignment operators are still required to perform their specified conversions. 仍然需要强制转换和赋值运算符执行其指定的转换。

This explains why specifically assigning to a double forces the value down to 64-bits and returning as double from a function does not. 这就解释了为什么专门为double赋值会强制将值降低到64位，而从函数返回double却不会。 That is quite surprising it to me. 这让我感到非常惊讶。

However, it looks like the C11 standard will actually make this less confusing by adding this text: 但是，看起来C11标准实际上将通过添加以下文本来减少混淆：

If the return expression is evaluated in a floating-point format different from the return type, the expression is converted as if by assignment [which removes any extra range and precision] to the return type of the function and the resulting value is returned to the caller. 如果返回表达式是用不同于返回类型的浮点格式求值的，则该表达式的转换就好像是通过将函数的返回类型赋值[删除了任何多余的范围和精度]一样，结果值返回到呼叫者。

So this code is basically exercising unspecified behavior as to whether the value does get truncated or not at various points. 因此，此代码基本上在未确定的行为上执行该值在各个点是否被截断的操作。

For me, on Ubuntu Precise, with -m32 : 对我来说，在Ubuntu Precise上，使用-m32 ：

clang passes clang传
clang -O9 also passes clang -O9也通过
gcc , assertion fails gcc ，断言失败
gcc -O9 passes, because it also is eliminating the constant expressions gcc -O9通过，因为它也消除了常量表达式
gcc -std=c99 fails gcc -std=c99失败
gcc -std=c1x also fails (but it may work on a later gcc) gcc -std=c1x也会失败（但可能会在以后的gcc上运行）
gcc -ffloat-store passes, but seems to have the side-effect of constant elimination gcc -ffloat-store通过，但似乎具有不断消除的副作用

I don't think this is a gcc bug because the standard allows this behavior but the clang behavior is nicer. 我认为这不是gcc错误，因为标准允许这种行为，但是clang行为更好。

Answer 4

In addition to all the details explained in other answers, I would say that there is a very simple rule concerning use of floating point types in almost any programming language since Fortran: never check floating point values for precise equality . 除了在其他答案中解释的所有详细信息之外，我想说的是关于Fortran以来几乎所有编程语言中使用浮点类型的非常简单的规则： 切勿检查浮点值是否精确相等 。 All the knowledge about 80-bit and 64-bit values is true, but it is true for a certain hardware and a certain compiler (yes, if you change the compiler or even turn the optimizations on or off, something may change). 关于80位和64位值的所有知识都是对的，但对于某些硬件和某个编译器，则是对的（是的，如果您更改编译器，甚至打开或关闭优化，则可能会有所改变）。 The more general rule (applicable to any code that is intended to be portable ) is that floating point values generally are not like integers or sequences of bytes, and can be changed, eg when copied, and checking them for equality often has unpredictable results. 更通用的规则（适用于任何旨在移植的代码 ）是，浮点值通常不像整数或字节序列，并且可以更改（例如，在复制时），并且检查它们的相等性通常会带来不可预测的结果。

So, even if it works in a test, usually it is better not to do so. 因此，即使它在测试中起作用，通常也最好不要这样做。 It may fail later when something changes. 某些更改之后，它可能会失败。

UPD: Though some people have downvoted, I insist the recommendation is generally correct. UPD：尽管有些人对此表示反对，但我坚持建议通常是正确的。 Things that seem to be just copying a value (they look so from a high level programming language programmer point of view; what happens in the initial example is a typical example, the value is returned and put into a variable and -- voila -- it is changed!), MAY change floating point values. 似乎只是在复制值的东西（从高级编程语言的程序员的角度来看，它们看起来是这样；在最初的示例中发生的是一个典型的示例，该值被返回并放入变量中-瞧-它已更改！），可以更改浮点值。 Comparing floating point values for equality or inequality is often a bad practice that may be allowed ONLY if you know why you may do that in your certain case. 比较相等或不相等的浮点值通常是一个坏习惯，只有在您知道为什么在特定情况下可以这样做时，才允许这样做。 And writing portable programs usually requires to minimize low-level knowledge. 编写可移植程序通常需要最小化底层知识。 Yes, it is very unlikely that integer values like 0 or 1 are changed when put into a floating point variable or copied. 是的，当将整数值（例如0或1）放入浮点变量或进行复制时，更改的可能性很小。 But more complex values (in the example above we see what happens to a result of a simple arithmetic expression!) may. 但是可能会有更复杂的值（在上面的示例中，我们看到了简单算术表达式的结果会发生什么！）。

为什么返回浮点值会更改其值？

问题描述

4 个解决方案

解决方案1
36 已采纳 2013-06-03 01:04:47

解决方案2
8 2013-06-03 00:38:25

解决方案3
3 2013-06-03 00:38:02

解决方案4
2 2013-06-03 03:08:43

为什么返回浮点值会更改其值？

问题描述

4 个解决方案

解决方案1 36 已采纳 2013-06-03 01:04:47

解决方案2 8 2013-06-03 00:38:25

解决方案3 3 2013-06-03 00:38:02

解决方案4 2 2013-06-03 03:08:43

解决方案1
36 已采纳 2013-06-03 01:04:47

解决方案2
8 2013-06-03 00:38:25

解决方案3
3 2013-06-03 00:38:02

解决方案4
2 2013-06-03 03:08:43