简体   繁体   中英

SSE Instructions

I have a question regarding SSE instruction.

I hope this is the right place to ask such a question if not pls let me know and I will remove this question.

My goal is to use SSE instructions to execute calculations on 3 chars in parallel.

I have a typedef struct which has the attribute that it is packed

typedef struct
{
        unsigned char x;
        unsigned char y;
        unsigned char z;
} __attribute__((packed)) Number;

For each char I have to go through a certain calculation.

As an example:

((Number[0].x * 20)  / 256);

I have to do a small calculation for every char and then add them together.

Since I have to write the code in assembly I have already done some research and stumble upon this instruction:

__m128i _mm_add_epi8 (__m128i a, __m128i b)

As far as I am concerned this should add two values (who have each the size of 8 bytes) together and save the result.

At least that's how I understand it: From this link

But since we only add two values together this defeats the whole purpose of executing multiple instructions at once.

Any help would be very apricated. Kind regards!

If you could provide more information about how you're actually using this it might be possible to better optimize it, but based on what you wrote I guess you'd want something like _mm_srli_epi32(_mm_mullo_epi32(_mm_set_epi32(nx, ny, nz, 0), _mm_set1_epi32(20)), 8) . It requires SSE 4.1, but if you want something which works for SSE 2 see SSE multiplication of 4 32-bit integers for a _mm_mullo_epi32 replacement.

You didn't specify what you want to do with the result, but you can use something like ((int*) &r_sse)[i] to access the results, where i is 1 for z, 2 for y, and 1 for x.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM