[英]Mixing SSE2 and AVX intrinsics with different compilers
Is it possible to mix VEX and non-VEX encoded SIMD intrinsics in the same compilation unit? 是否可以在同一编译单元中混合使用VEX和非VEX编码的SIMD内部函数? I want to do it to simplify code release to different compilers as single file modules.
我想这样做是为了简化将代码作为单个文件模块发布到不同的编译器的过程。
You don't need to do this, and it's often better to just build whole files with -march=haswell
vs. -march=core2
or something, so you can set tuning options as well as a target instruction set. 您不需要这样做,通常最好使用
-march=haswell
与-march=core2
文件来构建整个文件,因此可以设置调整选项以及目标指令集。
But separate compilation units makes it harder to let small functions inline, so maybe there is a use-case here if you're careful not to actually cause SSE-AVX transition penalties from mixing VEX/non-VEX without vzeroupper
, or put VEX-coded instructions into code paths that run on CPUs without AVX support. 但是单独编译单元使得它更难让小的内部函数,所以也许这里有一个用例,如果你小心,不要实际上会导致SSE,AVX过渡处罚从混合VEX /非VEX没有
vzeroupper
,或者把VEX-将已编码的指令编码到在没有AVX支持的CPU上运行的代码路径中。
IDK how well compilers respect target attributes when inlining, but link-time optimization can inline code from compilation units compiled with different options, too, and AFAIK that doesn't cause problems. IDK在内联时,编译器对目标属性的尊重程度如何,但是链接时优化也可以内联来自使用不同选项编译的编译单元的代码,而AFAIK不会造成问题。
With GNU C function attributes, yes . 对于GNU C函数属性,yes 。 This works with gcc and clang, but not ICC apparently, even though it doesn't reject the attribute syntax.
这适用于gcc和clang,但显然不适用于ICC,即使它不拒绝属性语法也是如此。
Obviously it doesn't work with MSVC, which has different command line options anyway. 显然,它不适用于MSVC,后者无论如何都有不同的命令行选项。 With MSVC, you can compile a file that uses AVX intrinsics without
/arch:AVX
, but DON'T do that; 使用MSVC,您可以在不使用
/arch:AVX
情况下编译使用AVX内在函数的文件,但不要这样做。 it will use VEX encoding only for the instructions that aren't encodeable at all with legacy SSE, like _mm_permutevar_ps
( vpermilps
) , leading to transition penalties. 它将仅对旧版SSE根本无法编码的指令使用VEX编码,例如
_mm_permutevar_ps
( vpermilps
) ,从而导致过渡处罚。
The GNU C way: GNU C方式:
#include <immintrin.h>
__m128 addps_sse(__m128 x, __m128 y) {
return x+y; // GNU C alternative to _mm_add_ps.
}
__attribute((target("avx"))) // <<<<<<<<<<< This line
__m128 addps_avx(__m128 x, __m128 y) {
return x+y;
}
Compiled (on the Godbolt compiler explorer) with gcc and clang -O3 -march=nehalem
which makes SSE4.2 available (and tunes for Nehalem), but doesn't enable AVX. 使用gcc和clang
-O3 -march=nehalem
编译(在Godbolt编译器浏览器上) ,这使SSE4.2可用(并针对Nehalem进行了调优),但未启用AVX。
addps_sse:
addps xmm0, xmm1
ret
addps_avx:
vaddps xmm0, xmm0, xmm1
ret
Both gcc and clang emit identical asm, of course. 当然,gcc和clang都发出相同的asm。 ICC uses
addps
(non-VEX) for both versions. ICC对两个版本都使用
addps
(非VEX)。 I didn't check if ICC allowed _mm256
intrinsics inside the function with AVX enabled, but gcc should. 我没有检查ICC
_mm256
在启用AVX的函数中允许_mm256
内部函数,但gcc应该可以。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.