为什么这个使用 gcc、-mfpmath=387 和优化级别 -O2 或 -O3 编译的简单程序会产生 NaN 值？

Question

I have a short program that performs a numerical computation, and obtains an incorrect NaN result when some specific conditions hold.我有一个执行数值计算的短程序，当某些特定条件成立时，会得到不正确的 NaN 结果。 I cannot see how this NaN result can arise.我看不出这个 NaN 结果是如何产生的。 Note that I am not using compiler options that allow the reordering of arithmetic operations, such as -ffath-math .请注意，我没有使用允许重新排序算术运算的编译器选项，例如-ffath-math 。

Question: I am looking for an explanation of how the NaN result arises.问题：我正在寻找 NaN 结果如何产生的解释。 Mathematically, there is nothing in the computation that leads to division by zero or similar.在数学上，计算中没有任何东西会导致除以零或类似的东西。 Am I missing something obvious?我错过了一些明显的东西吗？

Note that I am not asking how to fix the problem—that is easy.请注意，我不是在问如何解决问题——这很容易。 I am simply looking for an understanding of how the NaN appears.我只是想了解 NaN 是如何出现的。

Minimal example最小的例子

Note that this example is very fragile and even minor modifications, such as adding printf() calls in the loop to observe values, will change the behaviour.请注意，此示例非常脆弱，即使是很小的修改（例如在循环中添加printf()调用以观察值）也会改变行为。 This is why I was unable to minimize it further.这就是为什么我无法进一步最小化它的原因。

// prog.c

#include <stdio.h>
#include <math.h>

typedef long long myint;

void fun(const myint n, double *result) {
    double z = -1.0;
    double phi = 0.0;
    for (myint i = 0; i < n; i++) {
        double r = sqrt(1 - z*z);

        /* avoids division by zero when r == 0 */
        if (i != 0 && i != n-1) {
            phi += 1.0 / r;
        }

        double x = r*cos(phi);
        double y = r*sin(phi);

        result[i + n*0] = x;
        result[i + n*1] = y;
        result[i + n*2] = z;

        z += 2.0 / (n - 1);
    }
}

#define N 11

int main(void) {
    // perform computation
    double res[3*N];
    fun(N, res);

    // output result
    for (int i=0; i < N; i++) {
        printf("%g %g %g\n", res[i+N*0], res[i+N*1], res[i+N*2]);
    }

    return 0;
}

Compile with:编译：

gcc -O3 -mfpmath=387 prog.c -o prog -lm

The last line of the output is: output 的最后一行是：

nan nan 1

Instead of NaN, I expect a number close to zero.我希望数字接近于零，而不是 NaN。

Critical features of the example示例的关键特征

The following must all hold for the NaN output to appear:以下必须全部成立才能出现 NaN output：

Compile with GCC on an x86 platform.在 x86 平台上用 GCC 编译。 I was able to reproduce with this GCC 12.2.0 (from MacPorts) on macOS 10.14.6, as well as with GCC versions 9.3.0, 8.3.0 and 7.5.0 on Linux (openSUSE Leap 15.3).我能够在 macOS 10.14.6 上使用 GCC 12.2.0（来自 MacPorts）进行重现，在 Linux（openSUSE Leap 15.3）上使用 GCC 版本 9.3.0、8.3.0 和 7.5.0 进行重现。
I cannot reproduce it with GCC 10.2.0 or later on Linux, or GCC 11.3.0 on macOS.我无法在 Linux 上使用 GCC 10.2.0 或更高版本，或者在 macOS 上使用 GCC 11.3.0 重现它。
Choose to use x87 instructions with -mfpmath=387 , and an optimization level of -O2 or -O3 .选择使用带有-mfpmath=387的 x87 指令，以及-O2或-O3的优化级别。
myint must be a signed 64-bit type. myint必须是带符号的 64 位类型。
Thinking of result as an n-by-3 matrix, it must be stored in column-major order.将result视为 n×3 矩阵，它必须按列优先顺序存储。
No printf() calls in the main loop of fun() . fun()的主循环中没有printf()调用。

Without these features, I do get the expected output, ie something like 1.77993e-08 -1.12816e-08 1 or 0 0 1 as the last line.没有这些功能，我确实得到了预期的 output，即最后一行类似于1.77993e-08 -1.12816e-08 1或0 0 1 。

Explanation of the program程序说明

Even though it doesn't really matter to the question, I give a short explanation of what the program does, to make it easier to follow.尽管这对问题来说并不重要，但我还是对程序的作用做了一个简短的解释，以使其更容易理解。 It computes x , y , z three-dimensional coordinates of n points on the surface of a sphere in a specific arrangement.它以特定排列计算球体表面n个点的x 、 y 、 z三维坐标。 z values go from -1 to 1 in equal increments, however, the last value won't be precisely 1 due to numerical round-off errors. z值 go 从 -1 到 1 以相等的增量，但是，由于数值舍入误差，最后一个值不会恰好为 1。 The coordinates are written into an n -by-3 matrix, result , stored in column-major order.坐标被写入一个n × 3 矩阵result ，以列优先顺序存储。 r and phi are polar coordinates in the (x, y) plane. r和phi是 (x, y) 平面中的极坐标。

Note that when z is -1 or 1 then r becomes 0. This happens in the first and last iteration steps.请注意，当z为-1或1时， r变为 0。这发生在第一个和最后一个迭代步骤中。 This would lead to division by 0 in the 1.0 / r expression.这将导致在1.0 / r表达式中除以 0。 However, 1.0 / r is excluded from the first and last iteration of the loop.但是， 1.0 / r被排除在循环的第一次和最后一次迭代之外。

Answer 1

This is caused by interplay of x87 80-bit internal precision, non-conforming behavior of GCC, and optimization decisions differing between compiler versions.这是由 x87 80 位内部精度的相互作用、GCC 的不一致行为以及编译器版本之间的优化决策不同引起的。

x87 supports IEEE binary32 and binary64 only as storage formats, converting to/from its 80-bit representation on loads/stores. x87 仅支持 IEEE binary32 和 binary64 作为存储格式，在加载/存储时与其 80 位表示形式相互转换。 To make program behavior predictable, the C standard requires that extra precision is dropped on assignments, and allows to check intermediate precision via the FLT_EVAL_METHOD macro.为了使程序行为可预测，C 标准要求在赋值时放弃额外精度，并允许通过FLT_EVAL_METHOD宏检查中间精度。 With -mfpmath=387 , FLT_EVAL_METHOD is 2, so you know that intermediate precision corresponds to the long double type.使用-mfpmath=387时， FLT_EVAL_METHOD为 2，因此您知道中间精度对应于long double类型。

Unfortunately, GCC does not drop extra precision on assignments, unless you're requesting stricter conformance via -std=cNN (as opposed to -std=gnuNN ), or explicitly passing -fexcess-precision=standard .不幸的是， GCC 不会降低分配的额外精度，除非您通过-std=cNN （而不是-std=gnuNN ）请求更严格的一致性，或者明确传递-fexcess-precision=standard 。

In your program, the z += 2.0 / (n - 1);在你的程序中， z += 2.0 / (n - 1); statement should be computed by:声明应通过以下方式计算：

Computing 2.0 / (n - 1) in the intermediate 80-bit precision.以中间 80 位精度计算2.0 / (n - 1) 。
Adding to previous value of z (still in the 80-bit precision).添加到之前的z值（仍然是 80 位精度）。
Rounding to the declared type of z (ie to binary64) .四舍五入到z的声明类型（即到 binary64）。

In the version that ends up with NaNs, GCC instead does the following:在以 NaN 结尾的版本中，GCC 改为执行以下操作：

Computes 2.0 / (n - 1) just once before the loop.在循环之前计算2.0 / (n - 1)一次。
Rounds this fraction from binary80 to binary64 and stores on stack.将该分数从 binary80 舍入为 binary64 并存储在堆栈中。
In the loop, it reloads this value from stack and adds to z .在循环中，它从堆栈重新加载此值并添加到z 。

This is non-conforming, because the 2.0 / (n - 1) undergoes rounding twice (first to binary80, then to binary64).这是不符合要求的，因为2.0 / (n - 1)进行了两次舍入（首先是 binary80，然后是 binary64）。

The above explains why you saw different results depending on compiler version and optimization level.上面解释了为什么您看到不同的结果取决于编译器版本和优化级别。 However, in general you cannot expect your computation to not produce NaNs in the last iteration.但是，通常您不能期望您的计算在最后一次迭代中不产生 NaN。 When n - 1 is not a power of two, 2.0 / (n - 1) is not representable exactly and may be rounded up.当n - 1不是 2 的幂时， 2.0 / (n - 1)不能精确表示，可能会四舍五入。 In that case, 'z' may be growing a bit faster than the true sum -1.0 + 2.0 / (n - 1) * i , and may end up above 1.0 for i == n - 1 , causing sqrt(1 - z*z) to produce a NaN due to a negative argument.在这种情况下，“z”的增长速度可能比真正的和-1.0 + 2.0 / (n - 1) * i快一点，并且对于i == n - 1可能最终超过 1.0，导致sqrt(1 - z*z)由于参数是否定而产生 NaN。

In fact, if you change #define N 11 to #define N 12 in your program, you will deterministically get a NaN both with 80-bit and 64-bit intermediate precision.事实上，如果您在程序中将#define N 11更改为#define N 12 ，您将确定性地获得具有 80 位和 64 位中间精度的 NaN。

Answer 2

... how the NaN result arises (?) ... NaN 结果是如何产生的（？）

Even though better adherence to the C spec may apparently solve OP's immediate problem, I assert other prevention practices should be considered.尽管更好地遵守 C 规范显然可以解决 OP 的直接问题，但我断言应该考虑其他预防措施。

sqrt(1 - z*z) is a candidate NaN when |z| > 1.0当|z| > 1.0时， sqrt(1 - z*z)是候选 NaN |z| > 1.0 . |z| > 1.0 。

The index test prevention of division by zero may not be enough and then leading to cos(INFINITE) , another NaN possibility.除以零的索引测试预防可能还不够，然后导致cos(INFINITE) ，这是另一种 NaN 可能性。

// /* avoids division by zero when r == 0 */
//    if (i != 0 && i != n-1) {
//        phi += 1.0 / r;
//    }

To avoid these, 1) test directly and 2) use more a more precise approach.为了避免这些，1) 直接测试和 2) 使用更精确的方法。

if (r) {
  phi += 1.0 / r;
}

// double r = sqrt(1 - z*z);
double rr = (1-z)*(1+z);  // More precise than 1 - z*z
double r = rr < 0.0 ? 0.0 : sqrt(rr);

为什么这个使用 gcc、-mfpmath=387 和优化级别 -O2 或 -O3 编译的简单程序会产生 NaN 值？

问题描述

Minimal example最小的例子

Critical features of the example示例的关键特征

Explanation of the program程序说明

2 个解决方案

解决方案1
11 已采纳 2022-12-08 12:55:52

解决方案2
7 2022-12-08 14:37:00

为什么这个使用 gcc、-mfpmath=387 和优化级别 -O2 或 -O3 编译的简单程序会产生 NaN 值？

问题描述

Minimal example最小的例子

Critical features of the example示例的关键特征

Explanation of the program程序说明

2 个解决方案

解决方案1 11 已采纳 2022-12-08 12:55:52

解决方案2 7 2022-12-08 14:37:00

解决方案1
11 已采纳 2022-12-08 12:55:52

解决方案2
7 2022-12-08 14:37:00