用于增强中间浮点计算精度的编译器标志

Question

Is there a flag in gcc/clang that specifies the precision of the intermediate floating-point calculation? gcc / clang中是否有一个标志指定中间浮点计算的精度？

Suppose I have a C code 假设我有一个C代码

double x = 3.1415926;
double y = 1.414;
double z = x * y;

Is there a compiler flag that allows 'x*y' to be calculated in the highest possible precision of the user's machine, say, long-double (64-bit mantissa), and then truncated back to double (53-bit mantissa, the precision of the declared variable type)? 是否有编译器标志允许以用户机器的最高精度计算'x * y'，例如，long-double（64位尾数），然后截断为double（53位尾数，声明变量类型的精度）？

For information only, I am using Ubuntu 14.04 on a 64-bit machine. 仅供参考，我在64位计算机上使用Ubuntu 14.04。

Answer 1

GCC GCC

[Edit on observed behavior of gcc 4.8.4, where default behavior is the opposite to documentation] [编辑gcc 4.8.4的观察行为，其中默认行为与文档相反]

You need to make use of the 80-bit registers in the x87 FPU. 您需要使用x87 FPU中的80位寄存器。 With with -mfpmath=387 you can override the default use of the SSE registers XMM0-XMM7. 使用-mfpmath=387您可以覆盖SSE寄存器XMM0-XMM7的默认使用。 This default actually gives you the the IEEE behavior where 64-bit registers are used at every step. 此默认值实际上为您提供了IEEE行为，其中每个步骤都使用64位寄存器。

See: https://gcc.gnu.org/wiki/x87note 请参阅： https ： //gcc.gnu.org/wiki/x87note

Thus, by default x87 arithmetic is not true 64/32 bit IEEE, but gets extended precision from the x87 unit. 因此，默认情况下，x87算术不是64/32位IEEE，而是从x87单元获得扩展精度。 However, anytime a value is moved from the registers to an IEEE 64 or 32 bit storage location, this 80 bit value must be rounded down to the appropriate number of bits. 但是，只要将值从寄存器移到IEEE 64或32位存储单元，就必须将该80位值向下舍入到适当的位数。

If your operation is extremely complex, however, register spilling may occur; 但是，如果您的操作非常复杂，可能会发生寄存器溢出; the FP register stack is only depth 8. So when the spillage copies out to a word-sized RAM location you'll get the rounding then. FP寄存器堆栈只有深度8.因此，当溢出复制到字大小的RAM位置时，您将获得舍入。 You'll either need to declare long double yourself this case and round manually at the end, or check the assembler output for explicit spillage. 您需要在这种情况下自己声明long double并在末尾手动舍入，或者检查汇编器输出是否显式溢出。

More information about registers here: https://software.intel.com/en-us/articles/introduction-to-x64-assembly 有关寄存器的更多信息，请访问： https ： //software.intel.com/en-us/articles/introduction-to-x64-assembly

In particular, XMM0...7 registers, while 128 bits wide, are only so to accommodate two simultaneous 64-bit FP operations. 特别是，XMM0 ... 7寄存器虽然128位宽，但只能容纳两个同时进行的64位FP操作。 So you want be seeing the stack-operated FPR registers with the FLD (load), FMUL (multiply), and FSTP (store-and-pop) instructions. 因此，您希望通过FLD（加载），FMUL（乘法）和FSTP（存储和弹出）指令查看堆栈操作的FPR寄存器。

So I compiled this code: 所以我编译了这段代码：

double mult(double x, double y) {
    return x * y;
}

with: 有：

gcc -mfpmath=387 -Ofast -o precision.s -S precision.c

And got: 得到了：

mult:
  .LFB24:
    .cfi_startproc
    movsd   %xmm1, -8(%rsp)
    fldl    -8(%rsp)
    movsd   %xmm0, -8(%rsp)
    fldl    -8(%rsp)
    fmulp   %st, %st(1)
    fstpl   -8(%rsp)
    movsd   -8(%rsp), %xmm0
    ret
    .cfi_endproc

Everything makes perfect sense now. 现在一切都很完美。 Floating point values are passed via registers XMM0 and XMM1 (although they have to take a bizarre round-trip through memory before they can be put on the FPR stack), and the result is returned in XMM0 in accordance with above Intel reference. 浮点值通过寄存器XMM0和XMM1传递（尽管它们必须通过内存进行奇怪的往返才能将它们放到FPR堆栈上），并根据上面的Intel参考在XMM0中返回结果。 Not sure why there isn't a simple FLD instruction directly from XMM0/1 but apparently the instruction set doesn't do that. 不确定为什么没有直接来自XMM0 / 1的简单FLD指令，但显然指令集不这样做。

If you compare to -mfpmath=sse , there's a lot less to have to do in the latter case, because the operands are ready and waiting in the XMM0/1 registers and it's as simple as a single MULSD instruction. 如果与-mfpmath=sse进行比较，则在后一种情况下要做的事情要少得多，因为操作数已准备好并在XMM0 / 1寄存器中等待，并且它就像单个MULSD指令一样简单。

用于增强中间浮点计算精度的编译器标志

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-08-07 16:48:40

用于增强中间浮点计算精度的编译器标志

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-08-07 16:48:40

解决方案1
3 已采纳 2016-08-07 16:48:40