Mixing SSE2 and AVX intrinsics with different compilers

Question

Is it possible to mix VEX and non-VEX encoded SIMD intrinsics in the same compilation unit? I want to do it to simplify code release to different compilers as single file modules.

Answer 1

You don't need to do this, and it's often better to just build whole files with -march=haswell vs. -march=core2 or something, so you can set tuning options as well as a target instruction set.

But separate compilation units makes it harder to let small functions inline, so maybe there is a use-case here if you're careful not to actually cause SSE-AVX transition penalties from mixing VEX/non-VEX without vzeroupper , or put VEX-coded instructions into code paths that run on CPUs without AVX support.

IDK how well compilers respect target attributes when inlining, but link-time optimization can inline code from compilation units compiled with different options, too, and AFAIK that doesn't cause problems.

With GNU C function attributes, yes . This works with gcc and clang, but not ICC apparently, even though it doesn't reject the attribute syntax.

Obviously it doesn't work with MSVC, which has different command line options anyway. With MSVC, you can compile a file that uses AVX intrinsics without /arch:AVX , but DON'T do that; it will use VEX encoding only for the instructions that aren't encodeable at all with legacy SSE, like _mm_permutevar_ps ( vpermilps ) , leading to transition penalties.

The GNU C way:

#include <immintrin.h>

__m128 addps_sse(__m128 x, __m128 y) {
    return x+y;       // GNU C alternative to _mm_add_ps.
}

__attribute((target("avx")))    // <<<<<<<<<<< This line
__m128 addps_avx(__m128 x, __m128 y) {
    return x+y;
}

Compiled (on the Godbolt compiler explorer) with gcc and clang -O3 -march=nehalem which makes SSE4.2 available (and tunes for Nehalem), but doesn't enable AVX.

addps_sse:
        addps   xmm0, xmm1
        ret
addps_avx:
        vaddps  xmm0, xmm0, xmm1
        ret

Both gcc and clang emit identical asm, of course. ICC uses addps (non-VEX) for both versions. I didn't check if ICC allowed _mm256 intrinsics inside the function with AVX enabled, but gcc should.

Mixing SSE2 and AVX intrinsics with different compilers

Question

1 answers

solution1
4 ACCPTED 2018-03-12 04:02:32

Mixing SSE2 and AVX intrinsics with different compilers

Question

1 answers

solution1 4 ACCPTED 2018-03-12 04:02:32

solution1
4 ACCPTED 2018-03-12 04:02:32