简体   繁体   English

-Ofast 的 GCC 问题?

[英]GCC issue with -Ofast?

I have a question about the latest GCC compilers (version >= 5) with this code:我有一个关于最新 GCC 编译器(版本 >= 5)的问题,代码如下:

#include <math.h>

void test_nan (
    const float * const __restrict__ in,
    const int n,
    char * const __restrict__ out )
{
    for (int i = 0; i < n; ++i)
        out[i] = isnan(in[i]);
}

The assembly listing from GCC: GCC 的组装清单:

test_nan:
        movq    %rdx, %rdi
        testl   %esi, %esi
        jle     .L1
        movslq  %esi, %rdx
        xorl    %esi, %esi
        jmp     memset
.L1:
        ret

This looks like memset(out, 0, n) .这看起来像memset(out, 0, n) Why does GCC assume that no entries can be NaN with -Ofast?为什么 GCC 假设没有条目可以是 -Ofast 的 NaN? With the same compilation options, ICC does not show this issue.使用相同的编译选项,ICC 不会显示此问题。 With GCC, the issue goes away with "-O3".对于 GCC,问题会随着“-O3”而消失。

Note that with "-O3", this query gcc -c -Q -O3 --help=optimizers | egrep -i nan请注意,使用“-O3”,此查询gcc -c -Q -O3 --help=optimizers | egrep -i nan gcc -c -Q -O3 --help=optimizers | egrep -i nan gives -fsignaling-nans [disabled] . gcc -c -Q -O3 --help=optimizers | egrep -i nan给出-fsignaling-nans [disabled]

I verified this both locally and on godbolt , with the additional option "-std=c99".我在本地和Godbolt上都验证了这一点,并带有附加选项“-std=c99”。

Edit: by following the helpful answers below I can confirm that -Ofast -std=c99 -fno-finite-math-only properly addresses this issue.编辑:按照下面的有用答案,我可以确认-Ofast -std=c99 -fno-finite-math-only正确解决了这个问题。

From the GCC Options That Control Optimizations documentation.来自控制优化文档的 GCC 选项

-Ofast enables the following optimizations in addition to -O3 : -Ofast除了-O3之外还启用了以下优化:

It turns on -ffast-math , -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens.它会打开-ffast-math 、 -fallow-store-data-races 和 Fortran 特定的 -fstack-arrays,除非指定了 -fmax-stack-var-size 和 -fno-protect-parens。

-ffast-math enables the following: -ffast-math启用以下功能:

-fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only , -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast. -fno-math-errno、-funsafe-math-optimizations、 -ffinite-math-only 、-fno-rounding-math、-fno-signaling-nans、-fcx-limited-range 和 -fexcess-precision=fast。

-ffinite-math-only does the following: -ffinite-math-only执行以下操作:

Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.允许优化浮点运算,假设 arguments 和结果不是 NaN或 +-Infs。

This allows it to assume that isnan() always returns 0 .这允许它假设isnan()总是返回0

Barmar's answer explains why -Ofast causes the compiler to assume NaN never happens. Barmar 的回答解释了为什么-Ofast导致编译器假定 NaN 永远不会发生。 I have two things to add to this.我有两件事要补充。

First, you said something about seeing -fsignaling-nans [disabled] in --help=optimize output.首先,您谈到在--help=optimize output 中看到-fsignaling-nans [disabled] Signaling NaNs are a subcategory of all NaN bit patterns.信令NaN 是所有 NaN 位模式的子类别。 The CPU will fire a floating-point exception when they are used (consult the architecture manual for exactly what "when they are used" means). CPU 将在使用时触发浮点异常(请参阅架构手册以了解“使用时”的确切含义)。 Normally people use only the other kind, quiet NaNs, because dealing with floating point exceptions is a pain;通常人们只使用另一种安静的 NaN,因为处理浮点异常很痛苦; so, by default, GCC generates code that handles quiet NaNs (and ±Inf) but not signaling NaNs.因此,默认情况下,GCC 生成处理安静 NaN(和 ±Inf)但发出信号的代码。 isnan is true for both quiet and signaling NaNs. isnan对于安静和信号 NaN 都是正确的。 In short, -fsignaling-nans is a red herring;简而言之, -fsignaling-nans是一条红鲱鱼; the option that directly controls the behavior you didn't like is -ffinite-math-only .直接控制您不喜欢的行为的选项是-ffinite-math-only

Second, if you were using -Ofast because you wanted this function to be vectorized, try -O3 -march=native instead.其次,如果您使用-Ofast是因为您希望将此 function 进行矢量化,请尝试使用-O3 -march=native代替。 Loop vectorization is enabled at -O3 , and -march=native directs GCC to optimize for the full capabilities of the CPU it's running on.-O3启用循环矢量化,并且-march=native指示 GCC 优化其运行的 CPU 的全部功能。 Without any -march switches, GCC will assume it can only use CPU features that are guaranteed to be available by the psABI;如果没有任何-march开关,GCC 将假定它只能使用 psABI 保证可用的 CPU 功能; for x86-64 (as it appears you have), that's SSE2 but nothing later, which leaves out most of the vector capabilities.对于 x86-64(看起来你有),那是 SSE2,但之后没有,这遗漏了大部分矢量功能。 On the computer I'm typing this on, -O3 -march=native produces code for your example function that's half the size and probably about four times as fast as -O3 alone.在我正在输入的计算机上, -O3 -march=native为您的示例 function 生成代码,其大小只有 -O3 的一半速度可能大约是-O3的四倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM