简体   繁体   中英

What's the most efficient way to load and extract 32 bit integer values from a 128 bit SSE vector?

I'm trying to optimize my code using SSE intrinsics but am running into a problem where I don't know of a good way to extract the integer values from a vector after I've done the SSE intrinsics operations to get what I want.

Does anyone know of a good way to do this? I'm programming in C and my compiler is gcc version 4.3.2.

Thanks for all your help.

It depends on what you can assume about the minimum level of SSE support that you have.

Going all the way back to SSE2 you have _mm_extract_epi16 ( PEXTRW ) which can be used to extract any 16 bit element from a 128 bit vector. You would need to call this twice to get the two halves of a 32 bit element.

In more recent versions of SSE (SSE4.1 and later) you have _mm_extract_epi32 ( PEXTRD ) which can extract a 32 bit element in one instruction.

Alternatively if this is not inside a performance-critical loop you can just use a union, eg

typedef union
{
    __m128i v;
    int32_t a[4];
} U32;
_mm_extract_epi32

The extract intrinsics is indeed the best option but if you need to support SSE2, I'd recommend this:

inline int get_x(const __m128i& vec){return _mm_cvtsi128_si32 (vec);}
inline int get_y(const __m128i& vec){return _mm_cvtsi128_si32 (_mm_shuffle_epi32(vec,0x55));}
inline int get_z(const __m128i& vec){return _mm_cvtsi128_si32 (_mm_shuffle_epi32(vec,0xAA));}
inline int get_w(const __m128i& vec){return _mm_cvtsi128_si32 (_mm_shuffle_epi32(vec,0xFF));}

I've found that if you reinterpret_cast/union the vector to any int[4] representation the compiler tends to flush things back to memory (which may not be that bad) and reads it back as an int, though I haven't looked at the assembly to see if the latest versions of the compilers generate better code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM