简体   繁体   中英

convert array of uint64_t to __m256i

I have four uint64_t numbers and I wish to combine them as parts of a __m256i , however, I'm lost as to how to go about this.

Here's one attempt (where rax , rbx , rcx , and rdx are uint64_t ):

uint64_t a [4] = {rax,rbx,rcx,rcx};

__m256i t = _mm256_load_si256((__m256i *) &a);

If you already have an array, then yes absolutely use _mm256_loadu_si256 (or even the aligned version, _mm256_load_si256 if your array is alignas(32) .) But generally don't create an array just to store into / reload from.


Use the _mm_set intrinsics and let the compiler decide how to do it. Note that they take their args with the highest-numbered element first: eg

__m256i vt = _mm256_set_epi64x(rdx, rcx, rbx, rax);

You typically don't want the asm to look anything like your scalar store -> vector load C source, because that would produce a store-forwarding stall.

gcc 6.1 "sees through" the local array in this case (and uses 2x vmovq / 2x vpinsrq / 1x vinserti128 ), but it still generates code to align the stack to 32B. (Even though it's not needed because it didn't end up needing any 32B-aligned locals).

As you can see on the Godbolt Compiler Explorer , the actual data-movement part of both ways is the same, but the array way has a bunch of wasted instructions that gcc failed to optimize away after deciding to avoid the bad way that the source was implying.

_mm256_set_epi64x works in 32bit code (with gcc at least). You get 2x vmovq and 2x vmovhps to do 64bit loads to the upper half of an xmm register. (Add -m32 to the compile options in the godbolt link).

Firstly, make sure your CPU even supports these AVX instructions: Performing AVX integer operation .

Secondly, from https://software.intel.com/en-us/node/514151 , the pointer argument must be an aligned location. Conventionally allocated memory addresses on the stack are random and depend on the sizes of stack frames from previous calls, so may not be aligned.

Instead, just use the intrinsic type __m256i to force the compiler to align it; OR , according to https://software.intel.com/en-us/node/582952 , use __declspec(align) on your a array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM