简体   繁体   中英

GCC warning with typecasts of NEON array type (e.g. int32x4x2_t to int64x2x2_t)

Consider the following MRE:

#include <arm_neon.h>

void test(int64_t *r) {
  int32x4x2_t vr;

  // Do something actually interesting here instead  
  vr.val[0] = vr.val[1] = vdupq_n_s32(0);
  
  vst2q_s64(r, vr);
}

When compiling this code with gcc 11.2 I get the following error:

test.c: In function ‘test’:
test.c:9:16: error: incompatible type for argument 2 of ‘vst2q_s64’
    9 |   vst2q_s64(r, vr);
      |                ^~
      |                |
      |                int32x4x2_t
In file included from test.c:1:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:27890:39: note: expected ‘int64x2x2_t’ but argument is of type ‘int32x4x2_t’
27890 | vst2q_s64 (int64_t * __a, int64x2x2_t __val)
      |                           ~~~~~~~~~~~~^~~~~

This is entirely expected, and can be fixed as follows:

#include <arm_neon.h>

void test(int64_t *r) {
  int32x4x2_t vr;

  // Do something actually interesting here instead  
  vr.val[0] = vr.val[1] = vdupq_n_s32(0xDEADBEEF);
  
  vst2q_s64(r, *(int64x2x2_t *)&vr);
}

The code now compiles and actually does what I expect. However, compiling with gcc -O3 -Wall -Wextra -pedantic produces the following warning:

test.c: In function ‘test’:
test.c:9:17: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    9 |   vst2q_s64(r, *(int64x2x2_t *)&vr);
      |   

Assume I don't want to change vr 's type int32x4x2_t -- I have excellent reasons for the current choice, related to the code that was omitted from the MRE. Also I can't change vst2q_s64 to vst2q_s32 as the resulting memory layout would change.

I'm trying to figure out the "correct" way to cast vr so that this code compiles without warnings. NEON intrinsics such as vreinterpretq_s64_s32 do not work on these array types, eg uint32x4x2_t , or at least I can't find ones that do.

Something that does work (although I would appreciate a more compact solution) would be to copy each of vr 's elements to their corresponding elements in a variable of type int64x2x2_t , ie

#include <arm_neon.h>

void test(int64_t *r) {
  int32x4x2_t vr;
  int64x2x2_t vrr;

  // Do something actually interesting here instead  
  vr.val[0] = vr.val[1] = vdupq_n_s32(0);

  vrr.val[0] = vreinterpretq_s64_s32(vr.val[0]);
  vrr.val[1] = vreinterpretq_s64_s32(vr.val[1]);
  
  vst2q_s64(r, vrr);
}

This works, but I keep wondering whether there's an intrinsic I've missed, or a different casting syntax that would not cause the warning above.

AFAIK, there are no intrinsics to cast arrays of vectors, but you can write the necessary ones your self:

#include <arm_neon.h>

inline uint32x4x2_t vreinterpretq_u32_u16_2(uint16x8x2_t a) {
    uint32x4x2_t b={
        vreinterpretq_u32_u16(a.val[0]),
        vreinterpretq_u32_u16(a.val[1]),
    };
    return b;
}

The other legitimate method requires memcpy:

uint32x4x2_t vreinterpretq_u32_u16_2(uint16x8x2_t a) {
    uint32x4x2_t b;
    memcpy(&b, &a, sizeof(a));
    return b; 
}

Both of these compile to the same assembly: ret

I would only speculate that the lack of these intrinsics is due to C not supporting overloading, which will cause an explosion in the number of added function prototypes.

One can also simply write

uint32x4x2_t tmp;
memcpy(&tmp, &original, sizeof(original));

Or maybe this actually calls for a macro:

#define VCAST2(a, func) { func(a.val[0]), func(a.val[1]) }
#define VCAST3(a, func) { func(a.val[0]), func(a.val[1]), func(a.val[2]) }
#define VCAST4(a, func) { func(a.val[0]), func(a.val[1]), func(a.val[2]), func(a.val[3]) }

uint32x4x2_t tmp2 = VCAST2(a, vreinterpretq_u32_u16);
uint32x4x3_t tmp3 = VCAST3(b, vreinterpretq_u32_s64);
uint32x2x4_t tmp4 = VCAST4(c, vreinterpret_u32_f32);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM