简体   繁体   English

std::min 与三元 gcc 自动矢量化与#pragma GCC 优化(“O3”)

[英]std::min vs ternary gcc auto vectorization with #pragma GCC optimize ("O3")

I know that "why is my compiler doing this" aren't the best type of questions, but this one is really bizarre to me and I'm thoroughly confused.我知道“为什么我的编译器这样做”不是最好的问题类型,但这对我来说真的很奇怪,我完全糊涂了。

I had thought that std::min() was the same as the handwritten ternary (with maybe some compile time template stuff), and it seems to compile down into the same operation when used normally.我曾认为std::min()与手写三元相同(可能带有一些编译时模板的东西),并且在正常使用时似乎可以编译成相同的操作。 However, when trying to make a "min and sum" loop autovectorize they don't seem to be the same, and I would love if someone could help me figure out why.但是,当尝试使“最小和”循环自动矢量化时,它们似乎并不相同,如果有人能帮助我找出原因,我会很高兴。 Here is a small example code that produces the issue:这是一个产生问题的小示例代码:

#pragma GCC target ("avx2")
#pragma GCC optimize ("O3")

#include <cstdio>
#include <cstdlib>
#include <algorithm>

#define N (1<<20)
char a[N], b[N];

int main() {
    for (int i=0; i<N; ++i) {
        a[i] = rand()%100;
        b[i] = rand()%100;
    }

    int ans = 0;
    #pragma GCC ivdep
    for (int i=0; i<N; ++i) {
        //ans += std::min(a[i], b[i]);
        ans += a[i]>b[i] ? a[i] : b[i];
    }
    printf("%d\n", ans);
}

I compile this on gcc 9.3.0 , with the compilation command g++ -o test test.cpp -ftree-vectorize -fopt-info-vec-missed -fopt-info-vec-optimized -funsafe-math-optimizations .我使用编译命令g++ -o test test.cpp -ftree-vectorize -fopt-info-vec-missed -fopt-info-vec-optimized -funsafe-math-optimizationsgcc 9.3.0上编译它。

And the code above as is debugs during compilation as:上面的代码在编译期间调试为:

test.cpp:19:17: optimized: loop vectorized using 32 byte vectors

In contrast, if I comment the ternary and uncomment the std::min , I get this:相反,如果我注释三元并取消注释std::min ,我会得到:

test.cpp:19:17: missed: couldn't vectorize loop
test.cpp:20:35: missed: statement clobbers memory: _9 = std::min<char> (_8, _7);

So std::min() seems to be doing something unusual that prevents gcc from understanding that it is just a min operation.所以std::min()似乎在做一些不寻常的事情,阻止 gcc 理解它只是一个最小操作。 Is this something that is caused by the standard?这是由标准引起的吗? Or is it an implementation failure?还是实施失败? Or is there some compile flag that would make this work?或者是否有一些编译标志可以使这项工作?

Summary: don't use #pragma GCC optimize .摘要:不要使用#pragma GCC optimize Use -O3 on the command line instead, and you'll get the behavior you expect.在命令行上使用-O3代替,您将获得您期望的行为。

GCC's documentation on #pragma GCC optimize says: GCC 关于#pragma GCC optimize文档说:

Each function that is defined after this point is treated as if it had been declared with one optimize(string) attribute for each string argument.在此之后定义的每个 function 都被视为已为每个字符串参数声明了一个optimize(string)属性。

And the optimize attribute is documented as : 并且optimize属性记录为

The optimize attribute is used to specify that a function is to be compiled with different optimization options than specified on the command line. optimize 属性用于指定 function 将使用不同于命令行指定的优化选项进行编译。 [...] The optimize attribute should be used for debugging purposes only. [...] optimize 属性应仅用于调试目的。 It is not suitable in production code.它不适合生产代码。 [Emphasis added, thanks Peter Cordes for spotting the last part.] [强调补充,感谢彼得科德斯发现最后一部分。]

So, don't use it.所以,不要使用它。

In particular, it looks like specifying #pragma GCC optimize ("O3") at the top of your file is not actually equivalent to using -O3 on the command line.特别是,看起来在文件顶部指定#pragma GCC optimize ("O3")实际上并不等同于在命令行上使用-O3 It turns out that the former doesn't result in std::min being inlined, and so the compiler actually does assume that it might modify global memory, such as your a,b arrays.事实证明,前者不会导致std::min被内联,因此编译器实际上确实假设它可能会修改全局 memory,例如您a,b arrays。 This naturally inhibits vectorization.这自然会抑制矢量化。

A careful reading of the documentation for __attribute__((optimize)) makes it look like each of the functions main() and std::min() will be compiled as if with -O3 .仔细阅读__attribute__((optimize))的文档,看起来每个函数main()std::min()都将像使用-O3一样编译。 But that's not the same as compiling the two of them together with -O3 , as only in the latter case would interprocedural optimizations like inlining be available.但这与将它们两者与-O3一起编译不同,因为只有在后一种情况下才可以使用内联等过程间优化。

Here is a very simple example on godbolt .这是关于 godbolt 的一个非常简单的例子 With #pragma GCC optimize ("O3") the functions foo() and please_inline_me() are each optimized, but please_inline_me() does not get inlined.使用#pragma GCC optimize ("O3")函数foo()please_inline_me()都进行了优化,但please_inline_me()没有内联。 But with -O3 on the command line, it does.但是在命令行上使用-O3就可以了。

A guess would be that the optimize attribute, and by extension #pragma GCC optimize , causes the compiler to treat the function as if its definition were in a separate source file which was being compiled with the specified option.猜测可能是optimize属性和扩展#pragma GCC optimize导致编译器将 function 视为其定义位于使用指定选项编译的单独源文件中。 And indeed, if std::min() and main() were defined in separate source files, you could compile each one with -O3 but you wouldn't get inlining.事实上,如果std::min()main()在单独的源文件中定义,您可以使用-O3编译每个文件,但不会内联。

Arguably the GCC manual should document this more explicitly, though I guess if it's only meant for debugging, it might be fair to assume it's intended for experts who would be familiar with the distinction.可以说 GCC 手册应该更明确地记录这一点,尽管我想如果它只是用于调试,假设它是为熟悉这种区别的专家准备的可能是公平的。

If you really do compile your example with -O3 on the command line, you get identical (vectorized) assembly for both versions, or at least I did.如果您确实在命令行上使用-O3编译了您的示例,那么您将获得两个版本的相同(矢量化)程序集,或者至少我做到了。 (After fixing the backwards comparison: your ternary code is computing max instead of min.) (修复向后比较后:您的三元代码正在计算最大值而不是最小值。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM