[英]GCC issue with -Ofast?
I have a question about the latest GCC compilers (version >= 5) with this code:我有一个关于最新 GCC 编译器(版本 >= 5)的问题,代码如下:
#include <math.h>
void test_nan (
const float * const __restrict__ in,
const int n,
char * const __restrict__ out )
{
for (int i = 0; i < n; ++i)
out[i] = isnan(in[i]);
}
The assembly listing from GCC: GCC 的组装清单:
test_nan:
movq %rdx, %rdi
testl %esi, %esi
jle .L1
movslq %esi, %rdx
xorl %esi, %esi
jmp memset
.L1:
ret
This looks like memset(out, 0, n)
.这看起来像
memset(out, 0, n)
。 Why does GCC assume that no entries can be NaN with -Ofast?为什么 GCC 假设没有条目可以是 -Ofast 的 NaN? With the same compilation options, ICC does not show this issue.
使用相同的编译选项,ICC 不会显示此问题。 With GCC, the issue goes away with "-O3".
对于 GCC,问题会随着“-O3”而消失。
Note that with "-O3", this query gcc -c -Q -O3 --help=optimizers | egrep -i nan
请注意,使用“-O3”,此查询
gcc -c -Q -O3 --help=optimizers | egrep -i nan
gcc -c -Q -O3 --help=optimizers | egrep -i nan
gives -fsignaling-nans [disabled]
. gcc -c -Q -O3 --help=optimizers | egrep -i nan
给出-fsignaling-nans [disabled]
。
I verified this both locally and on godbolt , with the additional option "-std=c99".我在本地和Godbolt上都验证了这一点,并带有附加选项“-std=c99”。
Edit: by following the helpful answers below I can confirm that -Ofast -std=c99 -fno-finite-math-only
properly addresses this issue.编辑:按照下面的有用答案,我可以确认
-Ofast -std=c99 -fno-finite-math-only
正确解决了这个问题。
From the GCC Options That Control Optimizations documentation.来自控制优化文档的 GCC 选项。
-Ofast
enables the following optimizations in addition to -O3
: -Ofast
除了-O3
之外还启用了以下优化:
It turns on -ffast-math , -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens.
它会打开-ffast-math 、 -fallow-store-data-races 和 Fortran 特定的 -fstack-arrays,除非指定了 -fmax-stack-var-size 和 -fno-protect-parens。
-ffast-math
enables the following: -ffast-math
启用以下功能:
-fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only , -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.
-fno-math-errno、-funsafe-math-optimizations、 -ffinite-math-only 、-fno-rounding-math、-fno-signaling-nans、-fcx-limited-range 和 -fexcess-precision=fast。
-ffinite-math-only
does the following: -ffinite-math-only
执行以下操作:
Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.
允许优化浮点运算,假设 arguments 和结果不是 NaN或 +-Infs。
This allows it to assume that isnan()
always returns 0
.这允许它假设
isnan()
总是返回0
。
Barmar's answer explains why -Ofast
causes the compiler to assume NaN never happens. Barmar 的回答解释了为什么
-Ofast
导致编译器假定 NaN 永远不会发生。 I have two things to add to this.我有两件事要补充。
First, you said something about seeing -fsignaling-nans [disabled]
in --help=optimize
output.首先,您谈到在
--help=optimize
output 中看到-fsignaling-nans [disabled]
。 Signaling NaNs are a subcategory of all NaN bit patterns.信令NaN 是所有 NaN 位模式的子类别。 The CPU will fire a floating-point exception when they are used (consult the architecture manual for exactly what "when they are used" means).
CPU 将在使用时触发浮点异常(请参阅架构手册以了解“使用时”的确切含义)。 Normally people use only the other kind, quiet NaNs, because dealing with floating point exceptions is a pain;
通常人们只使用另一种安静的 NaN,因为处理浮点异常很痛苦; so, by default, GCC generates code that handles quiet NaNs (and ±Inf) but not signaling NaNs.
因此,默认情况下,GCC 生成处理安静 NaN(和 ±Inf)但不发出信号的代码。
isnan
is true for both quiet and signaling NaNs. isnan
对于安静和信号 NaN 都是正确的。 In short, -fsignaling-nans
is a red herring;简而言之,
-fsignaling-nans
是一条红鲱鱼; the option that directly controls the behavior you didn't like is -ffinite-math-only
.直接控制您不喜欢的行为的选项是
-ffinite-math-only
。
Second, if you were using -Ofast
because you wanted this function to be vectorized, try -O3 -march=native
instead.其次,如果您使用
-Ofast
是因为您希望将此 function 进行矢量化,请尝试使用-O3 -march=native
代替。 Loop vectorization is enabled at -O3
, and -march=native
directs GCC to optimize for the full capabilities of the CPU it's running on.在
-O3
启用循环矢量化,并且-march=native
指示 GCC 优化其运行的 CPU 的全部功能。 Without any -march
switches, GCC will assume it can only use CPU features that are guaranteed to be available by the psABI;如果没有任何
-march
开关,GCC 将假定它只能使用 psABI 保证可用的 CPU 功能; for x86-64 (as it appears you have), that's SSE2 but nothing later, which leaves out most of the vector capabilities.对于 x86-64(看起来你有),那是 SSE2,但之后没有,这遗漏了大部分矢量功能。 On the computer I'm typing this on,
-O3 -march=native
produces code for your example function that's half the size and probably about four times as fast as -O3
alone.在我正在输入的计算机上,
-O3 -march=native
为您的示例 function 生成代码,其大小只有 -O3 的一半,速度可能大约是-O3
的四倍。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.