简体   繁体   中英

How to load bytes in a __m128i in a specific position

I need to load 4 bytes stored consecutively in an array in a specific position of a __m128i variable, namely to be able to do many int32_t sums, 4 at a time, storing all partial results.

For example:

const unsigned int SIZE = 2000000;
const unsigned int STEP = 100;

unsigned char* inBuffer = new char[SIZE];
//Fill inBuffer
const unsigned char* a = inBuffer;

int32_t* outBuffer = new int32_t[SIZE/STEP*4];
int32_t* result = outBuffer;

__m128i sum = _mm_setzero_si128 ()
for (int i = 0; i < SIZE; i+=STEP) {
    __m128i value = _mm_set_epi32 (a[3],a[2],a[1],a[0]);
    sum = __mm_add_epi32(sum,value);
    _mm_storeu_si128 ((__m128i*)result,sum);
    a+=STEP;
    result+=4;
    }

//Print outBuffer

delete[] inBuffer;
delete[] outBuffer;

I was wondering if there was a more efficient way to do so

The main problem here of course is this line:

__m128i value = _mm_set_epi32 (a[3],a[2],a[1],a[0]);

However a decent compiler should generate fairly efficient code for this. Take a look at the output ( gcc -O3 -S ... ) - if it's more than a few instructions then you may want to consider doing the load/unpack operations yourself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM