_mm512_mask_i32logather_pd not available for GNU compiler

Question

I have a codebase which contains AVX512 intrinsic instructions and was build using intel compiler. I am trying to run the same thing using GNU compiler. While compiling the code with -mavx512f flag using gcc, I am getting declaration error only for some AVX512 instructions like _mm512_mask_i32logather_pd .

Standalone Implementation

#include <iostream>
#include <immintrin.h>

int main() {
__m512d set = _mm512_undefined_pd();
__mmask16 msk = 42440;
__m512i v_index = _mm512_set_epi32(64,66,70,96,98,100,102,104,106,112,114,116,118,120,124,256);
int scale = 8;
int count_size = 495*4;
float *src_ptr = (float*)malloc(count_size*sizeof(float));
__m512 out_512 = (__m512)_mm512_mask_i32logather_pd(set, msk, v_index, (float*)src_ptr, _MM_SCALE_8);
return 0;
}

After running this standalone implementation for the function through gcc I am getting the error as

error: ‘_mm512_mask_i32logather_pd’ was not declared in this scope; did you mean ‘_mm512_mask_i32gather_pd’?

Running the same code using icc with -xCORE-AVX512 flag runs perfectly fine.

Is this because the GNU compiler doesn't support all the AVX512 instructions even though most of the instructions works perfectly fine by using -mavx512f flag?

Relevant information

gcc version - 11.2.0
ubuntu version - 22.04
icc version 2021.6.0

Answer 1

GCC has intrinsics for all AVX-512 instructions. It doesn't always have every alternate version of every intrinsic that differ only in their C semantics, not the underlying instruction they expose.

I think the only difference between the regular _mm512_mask_i32gather_pd intrinsic (which GCC supports) is that logather takes a __m512i vindex instead of __m256i . But uses only the low half , hence the lo in the name. (I looked at them in the intrinsics guide - same pseudocode, just a difference in C/C++ function signature. And they're listed as intrinsics for the same single instruction). There doesn't seem to be a higather intrinsic that includes a shuffle; you need to do the extracting yourself.

vgatherdpd gathers 8 double elements to fill a __m512d , using 32-bit indices. The corresponding 8 indices are only a total of 32 bytes wide. That's why the regular more widely-supported intrinsic only takes a __m256i vindex arg.

Your code strangely bothers to initialize 64 bytes (16 indices), not shuffling the high half down. Also you're merge-masking into _mm512_undefined_pd() , which seems a weird example. But pretty obviously this isn't intended to be useful, since you're also loading from uninitialized malloc . You're casting the result to a __m512 , I guess using this instruction to gather pairs of float instead of individual double s? If so, yeah it's more efficient to gather fewer elements, but it's a weird way to make a minimal simple example for an intrinsic you're looking for. I wonder if perhaps you were looking for _mm512_mask_i32gather_ps to gather 16x float elements , merging into a __m512 vector. (The non- _mask_ version gathers all 16 elements, and you don't have to supply a merge target; that's often what you want.)

If you do have your 8 indices in a wider vector for some reason (eg as a result of computation and you're going to do 2 gathers after shuffling), you can just cast the vector type:

  __m512i vindex = ...;  // the part we want is only the low half
  __m512d result = something to merge into;
 result = _mm512_mask_i32gather_pd(result, mask, _mm512_castsi512_si256(vindex),
                src_ptr, _MM_SCALE_8);

Your cast to (float*) in the arg list to the intrinsic makes no sense: it actually takes a void* so you can gather 64-bit chunks from anything (and yes it's strict-aliasing and alignment safe, not following C rules). But the normal type would be double* , since this is a _pd gather.

In your example, it would be simpler to just __m256 vindex = _mm256_setr_epi32(...); (Or set , if you like the highest-element-first order for the argument list.)

_mm512_mask_i32logather_pd not available for GNU compiler

Question

1 answers

solution1
0 2022-07-15 09:25:04

_mm512_mask_i32logather_pd not available for GNU compiler

Question

1 answers

solution1 0 2022-07-15 09:25:04

solution1
0 2022-07-15 09:25:04