简体   繁体   中英

gcc target for AVX2 disabling SSE instruction set

We have a translation unit we want to compile with AVX2 (only that one): It's telling GCC upfront, first line in the file:

#pragma GCC target "arch=core-avx2,tune=core-avx2"

This used to work with GCC 4.8 and 4.9 but from 6 onward (tried 7 and 8 too) we get this warning (that we treat as an error):

error: SSE instruction set disabled, using 387 arithmetics

On the first function returning a float. I have tried to enable back SSE 4.2 (and avx and avx2) like so

#pragma GCC target "sse4.2,arch=core-avx2,tune=core-avx2"

But that is not enough, the error persists.

EDIT:

Relevant compiler flags, we target AVX for most stuff:

-mfpmath=sse,387 -march=corei7-avx -mtune=corei7-avx

EDIT2: minimal sample:

#pragma GCC target "arch=core-avx2,tune=core-avx2"

#include <immintrin.h>
#include <math.h>

static inline float
lg1pf( float x ) {
    return log1pf(x)*1.44269504088896338700465f;
}

int main()
{
  log1pf(2.0f);
}

Compiled that way:

gcc -o test test.c -O2 -Wall -Werror -pedantic -std=c99 -mfpmath=sse,387 -march=corei7-avx -mtune=corei7-avx

In file included from /home/xxx/gcc-7.1.0/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/immintrin.h:45:0,
                 from test.c:3:
/home/xxx/gcc-7.1.0/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/avx512fintrin.h: In function ‘_mm_add_round_sd’:
/home/xxx/gcc-7.1.0/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/avx512fintrin.h:1412:1: error: SSE register return with SSE disabled
 {
 ^

GCC details (I don't have the flags that were used to compile it though) gcc --version gcc (GCC) 7.1.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Potential solution

#pragma GCC target "avx2"

Worked for me without other changes to the code. Applying the attribute to individual functions did not work either:

Related problem:

__attribute__((__target__("arch=broadwell")))  // does not compile
__m256 use_avx(__m256 a) { return _mm256_add_ps(a,a); }

__attribute__((__target__("avx2,arch=broadwell"))) // does not compile
__m256 use_avx(__m256 a) { return _mm256_add_ps(a,a); }

__attribute__((__target__("avx2"))) // compiles
__m256 use_avx(__m256 a) { return _mm256_add_ps(a,a); }

This looks like a bug. #pragma GCC target before #include <immintrin.h> breaks the header somehow, IDK why. Even if AVX2 was enabled on the command line with -march=haswell , a #pragma seems to break inlining of any intrinsics defined after that.

You can use #pragma after the header, but then using instrinsics that weren't enabled on the command line fails.

Even a more modern target name like #pragma GCC target "arch=haswell" causes the error, so it's not that the old nebulous target names like corei7-avx are broken in general. They still work on the command line. If you want to enable something for a whole file, the standard way is to use compiler options and not pragmas.

GCC does claim to support target options on a per-function basis with pragmas or __attribute__ , though. https://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html .


This is as far as I've gotten playing around with this ( Godbolt compiler explorer with gcc8.1 ). Clang is unaffected because it ignores #pragma GCC target . (So that means that #pragma is not very portable; you probably want your code work with any GNU C compiler, not just gcc itself.)

 // breaks gcc when before immintrin.h
 // #pragma GCC target "arch=haswell"

#include <immintrin.h>
#include <math.h>

//#pragma GCC target "arch=core-avx2,tune=core-avx2"
#pragma GCC target "arch=haswell"

//static inline 
float
lg1pf( float x ) {
    return log1pf(x)*1.44269504088896338700465f;
}

// can accept / return wide vectors
__m128 nop(__m128 a) { return a; }
__m256 require_avx(__m256 a) { return a; }
// but error on using intrinsics if #include happened without target options
//__m256 use_avx(__m256 a) { return _mm256_add_ps(a,a); }

// this works, though, because AVX is enabled at this point
// presumably so would  __builtin_ia32_whatever
// Without `arch=haswell`, this breaks, so we know the pragma "worked"
__m256 use_native_vec(__m256 a) { return a+a; }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM