如何在32位和64位模式下获得双精度操作的相同行为？

Question

I have a library that I'm converting to 64-bits. 我有一个库，我正在转换为64位。 However, I can't get bit-exact results on 64-bit mode, so my tests are failing. 但是，我无法在64位模式下获得精确的结果，因此我的测试失败了。

I reduced the problem to a simple test case: 我将问题简化为一个简单的测试用例：

#include <stdio.h>

int main(void) {
    printf("%d bits: ", sizeof(void*) * 8);
    volatile double d = 10.870191700000001;
    volatile double x = 0.10090000000000002;
    d += x * 30.07;
    printf("%0.15f\n", d);
}

To avoid compiler differences, I'm using the same compiler and cross compiling. 为避免编译器差异，我使用相同的编译器和交叉编译。 In this case, I'm using TDM-GCC 64-bit 5.1.0 on Windows 7 in a Core i5 CPU. 在这种情况下，我在Core i5 CPU上使用Windows 7上的TDM-GCC 64位5.1.0。 Here is my command-line: 这是我的命令行：

gcc double_test.c -o double_test.exe -m32 -O0 && double_test.exe && gcc double_test.c -o double_test.exe -m64 -O0 && double_test.exe

And the output is: 输出是：

32 bits: 13.904254700000001
64 bits: 13.904254700000003

In this case the error is minimal, but in my full test cases the error can add up and be enough to double my output. 在这种情况下，错误是最小的，但在我的完整测试用例中，错误可以加起来并足以使我的输出加倍。

How can I get bit-exact operations to match the 32-bit output? 如何获得与32位输出匹配的精确位操作？

The nearest I got to something relevant was to use -ffloat-store , but in this snippet it got the 32-bit execution like the 64-bit one, while I need just the opposite. 最接近我得到的东西是使用-ffloat-store ，但是在这个片段中，它得到了像64位一样的32位执行，而我需要恰恰相反。 However, this didn't have any noticeable effect upon my library. 但是，这对我的图书馆没有任何明显的影响。 I also tested the -fexcess-precision=standard and -mfp-math options to no avail. 我还测试了-fexcess-precision=standard和-mfp-math选项无济于事。

Answer 1

Since you said you need the more precise ...01 result, as well as determinism, you unfortunately can't just use -msse2 -mfpmath=sse in your 32-bit build. 既然你说你需要更精确的...01结果，以及确定性，你遗憾的是你不能在你的32位构建中使用-msse2 -mfpmath=sse 。 Future readers looking for determinism should use that. 寻找决定论的未来读者应该使用它。

You can use -mfpmath=387 to ask gcc to use slow/obsolete x87 math in 64-bit mode, where it's not the default. 您可以使用-mfpmath=387来请求gcc在64位模式下使用慢速/过时的x87数学运算，这不是默认值。 The calling convention passes/returns FP args in xmm registers, so this is even worse than in 32-bit mode, sometimes requiring extra store/reload. 调用约定在xmm寄存器中传递/返回FP args，因此这比在32位模式下更糟糕，有时需要额外的存储/重新加载。

peter@volta:/tmp$ gcc -m64 -mfpmath=387 -O3 fp-prec.c -o fp-64-387
peter@volta:/tmp$ ./fp-64-387 
64 bits: 13.904254700000001

I'm not sure if gcc strictly limits itself to x87 when auto-vectorization is possible. 我不确定当自动矢量化可能时gcc是否严格限制为x87。 If so, you're missing out on performance. 如果是这样，你会错过表现。

And BTW, in your example the ...01 is the result of keeping extra precision in an 80-bit temporary for the x*30.07 before adding it to d . 而BTW，在你的例子中， ...01是在将x*30.07添加到d之前为x*30.07保持80位临时值的额外精度的结果。 ( d is volatile , but d += stuff is still equivalent to d = d + stuff so the x*30.07 doesn't get rounded to 64-bit double first). （ d是volatile ，但是d += stuff仍然相当于d = d + stuff因此x*30.07不会首先四舍五入为64位double x*30.07 ）。

You could use long double , eg d += x * (long double)30.07 to force an 80-bit temporary there. 你可以使用long double ，例如d += x * (long double)30.07来强制80位临时。 long double is 80 bits in the x86-64 System V ABI Linux/OS X/*BSD/etc, but on x64 Windows it's the same as 64-bit double . 在x86-64 System V ABI Linux / OS X / * BSD /等中， long double是80位，但在x64 Windows上，它与64位double相同。 So that might not be an option for you. 所以这可能不适合你。

In this case you can get the same result with an FMA which keeps infinite precision for the multiply before doing the add. 在这种情况下，您可以使用FMA获得相同的结果，该FMA在执行添加之前保持乘法的无限精度。 This is slow on hardware without FMA support, but fma(d, 30.07, x) will reliably give the result you want. 在没有FMA支持的情况下，这在硬件上很慢，但是fma(d, 30.07, x)将可靠地提供您想要的结果。

If you need this, use it in the places where that precision is required. 如果需要，请在需要精度的地方使用它。

If you compile with FMA enabled, it can inline to an FMA instruction. 如果在启用FMA的情况下进行编译，则可以内联到FMA指令。 (eg -march=native on my Skylake CPU) （例如-march=native我的Skylake CPU上的-march=native ）

Even without using the fma() math.h function, gcc will contract mul+add expressions into FMA when optimizing. 即使不使用fma() math.h函数，gcc也会在优化时将mul + add表达式合并到FMA中。 (Unlike Clang, which I think doesn't do FP_CONTRACT by default without -ffast-math ). （不像铛，我认为没有做FP_CONTRACT没有默认-ffast-math ）。 Note that I'm not using -march=387 请注意，我没有使用-march=387

 # your original source code, using an FMA instruction (native=skylake in my case)
peter@volta:/tmp$ gcc -m64 -march=native -O3 fp-prec.c -o fp-64-native
peter@volta:/tmp$ ./fp-64-native 
64 bits: 13.904254700000001

The relevant part of main is: main的相关部分是：

 57e:   c5 fb 10 44 24 08       vmovsd xmm0,QWORD PTR [rsp+0x8] # load x
 584:   c5 fb 10 0c 24          vmovsd xmm1,QWORD PTR [rsp]     # load d
 589:   c4 e2 f1 99 05 d6 01 00 00      vfmadd132sd xmm0,xmm1,QWORD PTR [rip+0x1d6]        # the 30.07 constant
 592:   c5 fb 11 04 24          vmovsd QWORD PTR [rsp],xmm0     # store d
 597:   c5 fb 10 04 24          vmovsd xmm0,QWORD PTR [rsp]     # reload d
 59c:   e8 8f ff ff ff          call   530 <printf@plt>

FP determinism is hard in general. FP决定论一般很难。

See also https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ and https://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/ 另见https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/和https://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/

Answer 2

I would not aim to reproduce the 32-bit output, since it's a consequence of excess precision in the 32-bit x86 (x87) ABI and possibly also compiler non-conformance. 我不打算重现32位输出，因为它是32位x86（x87）ABI精度过高的结果，也可能是编译器不一致。 Instead try to match the 64-bit output which is what you should expect on good targets. 而是尝试匹配64位输出，这是你应该期望的好目标。 As long as you're okay with requiring a machine with sse2+, -mfpmath=sse will make 32-bit x86 behave like 64-bit and other more reasonable targets. 只要您对需要具有sse2 +的机器没问题， -mfpmath=sse将使32位x86表现得像64位和其他更合理的目标。

If you really need the result from 32-bit x86, ideally you should write it portably. 如果你真的需要32位x86的结果，理想情况下你应该便携地编写它。 This might involve breaking things down into a pair of double s, but for x86-only you could just use long double . 这可能涉及将事情分解为一对double s，但对于x86而言，你只能使用long double 。 In the particular example in your question, the fma function would work, too. 在你问题的特定例子中， fma函数也可以工作。

如何在32位和64位模式下获得双精度操作的相同行为？

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-10-09 23:34:43

解决方案2
2 2018-10-09 23:47:59

如何在32位和64位模式下获得双精度操作的相同行为？

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-10-09 23:34:43

解决方案2 2 2018-10-09 23:47:59

解决方案1
4 已采纳 2018-10-09 23:34:43

解决方案2
2 2018-10-09 23:47:59