简体   繁体   English

为什么数字运算程序在分成NaN时开始运行得慢得多?

[英]Why a number crunching program starts running much slower when diverges into NaNs?

A program repeats some calculation over an array of double s. 程序在double s数组上重复一些计算。 Then something unfortunate happens and NaN get produced... It starts running much slower after this. 然后发生了一些不幸的事情并且NaN产生了......在此之后它开始运行得慢得多。

-ffast-math does not change a thing. -ffast-math不会改变一件事。

Why does it happen with -ffast-math ? 为什么会发生-ffast-math Shouldn't it prevent throwing floating-point exceptions and just proceed and churn out NaN s at the same rate as usual numbers? 它不应该阻止抛出浮点异常,只是以与通常数字相同的速率继续生成NaN吗?

Simple example: 简单的例子:

nan.c nan.c

#include <stdio.h>
#include <math.h>

int main() {
    long long int i;
    double a=-1,b=0,c=1;

    for(i=0; i<100000000; ++i) {
        a+=0.001*(b+c)/1000;
        b+=0.001*(a+c)/1000;
        c+=0.001*(a+b)/1000;
        if(i%1000000==0) { fprintf(stdout, "%g\n", a); fflush(stdout); }
        if(i==50000000) b=NAN;
    }
    return 0;
}

running: 运行:

$ gcc -ffast-math -O3 nan.c -o nan && ./nan  | ts '%.s'
...
1389025567.070093 2.00392e+33
1389025567.085662 1.48071e+34
1389025567.100250 1.0941e+35
1389025567.115273 8.08439e+35
1389025567.129992 5.9736e+36
1389025568.261108 nan
1389025569.385904 nan
1389025570.515169 nan
1389025571.657104 nan
1389025572.805347 nan

Update : Tried various -O3 , -ffast-math , -msse , -msse3 - no effect. 更新 :尝试了各种-O3-ffast-math-msse-msse3 - 没有效果。 Hovewer when I tried building for 64-bits instead of usual 32-bits, it started to process NaNs as fast as other numbers (in addition to general 50% speedup), even without any optimisation options. Hovewer当我尝试构建64位而不是通常的32位时,它开始像其他数字一样快速处理NaN(除了通常的50%加速),即使没有任何优化选项。 Why NaNs are so slow in 32-bit mode with -ffast-math ? 为什么NaN在32位模式下使用-ffast-math如此慢?

Floating point operations on NaN are exceptional cases and definitely take longer to execute. NaN上的浮点运算是特殊情况,执行时间肯定会更长。 It's important to remember when vectorizing with SSE because any NaNs that sneak into don't-care columns in the registers can still make your code run much slower. 记住使用SSE进行矢量化时很重要,因为任何潜入寄存器中无关列的NaN仍然会使代码运行得慢得多。

This page includes some performance measurements of math on NaN which is even worse than I thought! 这个页面包含一些关于NaN数学的性能测量,这比我想象的更糟糕!

Your compiler defaults to using x87 (which incurs a stall for processing NaNs) when producing a 32-bit executable. 在生成32位可执行文件时,您的编译器默认使用x87(它会导致处理NaN的停顿)。 Pass -mfpmath=sse to tell it to use SSE (which can handle NaNs at speed) instead. 传递-mfpmath=sse告诉它使用SSE(它可以快速处理NaN)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM