[英]Why a number crunching program starts running much slower when diverges into NaNs?
A program repeats some calculation over an array of double
s. 程序在
double
s数组上重复一些计算。 Then something unfortunate happens and NaN get produced... It starts running much slower after this. 然后发生了一些不幸的事情并且NaN产生了......在此之后它开始运行得慢得多。
-ffast-math
does not change a thing. -ffast-math
不会改变一件事。
Why does it happen with -ffast-math
? 为什么会发生
-ffast-math
? Shouldn't it prevent throwing floating-point exceptions and just proceed and churn out NaN
s at the same rate as usual numbers? 它不应该阻止抛出浮点异常,只是以与通常数字相同的速率继续生成
NaN
吗?
Simple example: 简单的例子:
nan.c nan.c
#include <stdio.h>
#include <math.h>
int main() {
long long int i;
double a=-1,b=0,c=1;
for(i=0; i<100000000; ++i) {
a+=0.001*(b+c)/1000;
b+=0.001*(a+c)/1000;
c+=0.001*(a+b)/1000;
if(i%1000000==0) { fprintf(stdout, "%g\n", a); fflush(stdout); }
if(i==50000000) b=NAN;
}
return 0;
}
running: 运行:
$ gcc -ffast-math -O3 nan.c -o nan && ./nan | ts '%.s'
...
1389025567.070093 2.00392e+33
1389025567.085662 1.48071e+34
1389025567.100250 1.0941e+35
1389025567.115273 8.08439e+35
1389025567.129992 5.9736e+36
1389025568.261108 nan
1389025569.385904 nan
1389025570.515169 nan
1389025571.657104 nan
1389025572.805347 nan
Update : Tried various -O3
, -ffast-math
, -msse
, -msse3
- no effect. 更新 :尝试了各种
-O3
, -ffast-math
, -msse
, -msse3
- 没有效果。 Hovewer when I tried building for 64-bits instead of usual 32-bits, it started to process NaNs as fast as other numbers (in addition to general 50% speedup), even without any optimisation options. Hovewer当我尝试构建64位而不是通常的32位时,它开始像其他数字一样快速处理NaN(除了通常的50%加速),即使没有任何优化选项。 Why NaNs are so slow in 32-bit mode with
-ffast-math
? 为什么NaN在32位模式下使用
-ffast-math
如此慢?
Floating point operations on NaN are exceptional cases and definitely take longer to execute. NaN上的浮点运算是特殊情况,执行时间肯定会更长。 It's important to remember when vectorizing with SSE because any NaNs that sneak into don't-care columns in the registers can still make your code run much slower.
记住使用SSE进行矢量化时很重要,因为任何潜入寄存器中无关列的NaN仍然会使代码运行得慢得多。
This page includes some performance measurements of math on NaN which is even worse than I thought! 这个页面包含一些关于NaN数学的性能测量,这比我想象的更糟糕!
Your compiler defaults to using x87 (which incurs a stall for processing NaNs) when producing a 32-bit executable. 在生成32位可执行文件时,您的编译器默认使用x87(它会导致处理NaN的停顿)。 Pass
-mfpmath=sse
to tell it to use SSE (which can handle NaNs at speed) instead. 传递
-mfpmath=sse
告诉它使用SSE(它可以快速处理NaN)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.