为什么数字运算程序在分成NaN时开始运行得慢得多？

Question

A program repeats some calculation over an array of double s. 程序在double s数组上重复一些计算。 Then something unfortunate happens and NaN get produced... It starts running much slower after this. 然后发生了一些不幸的事情并且NaN产生了......在此之后它开始运行得慢得多。

-ffast-math does not change a thing. -ffast-math不会改变一件事。

Why does it happen with -ffast-math ? 为什么会发生-ffast-math ？ Shouldn't it prevent throwing floating-point exceptions and just proceed and churn out NaN s at the same rate as usual numbers? 它不应该阻止抛出浮点异常，只是以与通常数字相同的速率继续生成NaN吗？

Simple example: 简单的例子：

nan.c nan.c

#include <stdio.h>
#include <math.h>

int main() {
    long long int i;
    double a=-1,b=0,c=1;

    for(i=0; i<100000000; ++i) {
        a+=0.001*(b+c)/1000;
        b+=0.001*(a+c)/1000;
        c+=0.001*(a+b)/1000;
        if(i%1000000==0) { fprintf(stdout, "%g\n", a); fflush(stdout); }
        if(i==50000000) b=NAN;
    }
    return 0;
}

running: 运行：

$ gcc -ffast-math -O3 nan.c -o nan && ./nan  | ts '%.s'
...
1389025567.070093 2.00392e+33
1389025567.085662 1.48071e+34
1389025567.100250 1.0941e+35
1389025567.115273 8.08439e+35
1389025567.129992 5.9736e+36
1389025568.261108 nan
1389025569.385904 nan
1389025570.515169 nan
1389025571.657104 nan
1389025572.805347 nan

Update : Tried various -O3 , -ffast-math , -msse , -msse3 - no effect. 更新：尝试了各种-O3 ， -ffast-math ， -msse ， -msse3 - 没有效果。 Hovewer when I tried building for 64-bits instead of usual 32-bits, it started to process NaNs as fast as other numbers (in addition to general 50% speedup), even without any optimisation options. Hovewer当我尝试构建64位而不是通常的32位时，它开始像其他数字一样快速处理NaN（除了通常的50％加速），即使没有任何优化选项。 Why NaNs are so slow in 32-bit mode with -ffast-math ? 为什么NaN在32位模式下使用-ffast-math如此慢？

Answer 1

Floating point operations on NaN are exceptional cases and definitely take longer to execute. NaN上的浮点运算是特殊情况，执行时间肯定会更长。 It's important to remember when vectorizing with SSE because any NaNs that sneak into don't-care columns in the registers can still make your code run much slower. 记住使用SSE进行矢量化时很重要，因为任何潜入寄存器中无关列的NaN仍然会使代码运行得慢得多。

This page includes some performance measurements of math on NaN which is even worse than I thought! 这个页面包含一些关于NaN数学的性能测量，这比我想象的更糟糕！

Answer 2

Your compiler defaults to using x87 (which incurs a stall for processing NaNs) when producing a 32-bit executable. 在生成32位可执行文件时，您的编译器默认使用x87（它会导致处理NaN的停顿）。 Pass -mfpmath=sse to tell it to use SSE (which can handle NaNs at speed) instead. 传递-mfpmath=sse告诉它使用SSE（它可以快速处理NaN）。

为什么数字运算程序在分成NaN时开始运行得慢得多？

问题描述

2 个解决方案

解决方案1
4 2014-01-06 16:31:44

解决方案2
4 已采纳 2014-01-06 22:55:19

为什么数字运算程序在分成NaN时开始运行得慢得多？

问题描述

2 个解决方案

解决方案1 4 2014-01-06 16:31:44

解决方案2 4 已采纳 2014-01-06 22:55:19

解决方案1
4 2014-01-06 16:31:44

解决方案2
4 已采纳 2014-01-06 22:55:19