简体   繁体   English

将SSE2和AVX内部函数与不同的编译器混合

[英]Mixing SSE2 and AVX intrinsics with different compilers

Is it possible to mix VEX and non-VEX encoded SIMD intrinsics in the same compilation unit? 是否可以在同一编译单元中混合使用VEX和非VEX编码的SIMD内部函数? I want to do it to simplify code release to different compilers as single file modules. 我想这样做是为了简化将代码作为单个文件模块发布到不同的编译器的过程。

You don't need to do this, and it's often better to just build whole files with -march=haswell vs. -march=core2 or something, so you can set tuning options as well as a target instruction set. 您不需要这样做,通常最好使用-march=haswell-march=core2文件来构建整个文件,因此可以设置调整选项以及目标指令集。

But separate compilation units makes it harder to let small functions inline, so maybe there is a use-case here if you're careful not to actually cause SSE-AVX transition penalties from mixing VEX/non-VEX without vzeroupper , or put VEX-coded instructions into code paths that run on CPUs without AVX support. 但是单独编译单元使得它更难让小的内部函数,所以也许这里有一个用例,如果你小心,不要实际上会导致SSE,AVX过渡处罚从混合VEX /非VEX没有vzeroupper ,或者把VEX-将已编码的指令编码到在没有AVX支持的CPU上运行的代码路径中。

IDK how well compilers respect target attributes when inlining, but link-time optimization can inline code from compilation units compiled with different options, too, and AFAIK that doesn't cause problems. IDK在内联时,编译器对目标属性的尊重程度如何,但是链接时优化也可以内联来自使用不同选项编译的编译单元的代码,而AFAIK不会造成问题。


With GNU C function attributes, yes . 对于GNU C函数属性,yes This works with gcc and clang, but not ICC apparently, even though it doesn't reject the attribute syntax. 这适用于gcc和clang,但显然不适用于ICC,即使它不拒绝属性语法也是如此。

Obviously it doesn't work with MSVC, which has different command line options anyway. 显然,它不适用于MSVC,后者无论如何都有不同的命令行选项。 With MSVC, you can compile a file that uses AVX intrinsics without /arch:AVX , but DON'T do that; 使用MSVC,您可以在不使用/arch:AVX情况下编译使用AVX内在函数的文件,但不要这样做。 it will use VEX encoding only for the instructions that aren't encodeable at all with legacy SSE, like _mm_permutevar_ps ( vpermilps ) , leading to transition penalties. 它将仅对旧版SSE根本无法编码的指令使用VEX编码,例如_mm_permutevar_psvpermilps ,从而导致过渡处罚。


The GNU C way: GNU C方式:

#include <immintrin.h>

__m128 addps_sse(__m128 x, __m128 y) {
    return x+y;       // GNU C alternative to _mm_add_ps.
}

__attribute((target("avx")))    // <<<<<<<<<<< This line
__m128 addps_avx(__m128 x, __m128 y) {
    return x+y;
}

Compiled (on the Godbolt compiler explorer) with gcc and clang -O3 -march=nehalem which makes SSE4.2 available (and tunes for Nehalem), but doesn't enable AVX. 使用gcc和clang -O3 -march=nehalem 编译(在Godbolt编译器浏览器上) ,这使SSE4.2可用(并针对Nehalem进行了调优),但未启用AVX。

addps_sse:
        addps   xmm0, xmm1
        ret
addps_avx:
        vaddps  xmm0, xmm0, xmm1
        ret

Both gcc and clang emit identical asm, of course. 当然,gcc和clang都发出相同的asm。 ICC uses addps (non-VEX) for both versions. ICC对两个版本都使用addps (非VEX)。 I didn't check if ICC allowed _mm256 intrinsics inside the function with AVX enabled, but gcc should. 我没有检查ICC _mm256在启用AVX的函数中允许_mm256内部函数,但gcc应该可以。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM