简体   繁体   English

如何使用SSE将_m128i转换为unsigned int?

[英]How do I convert _m128i to an unsigned int with SSE?

I have made a function for posterizing images. 我已经制作了一个用于分色图像的功能。

// =(
#define ARGB_COLOR(a, r, g, b) (((a) << 24) | ((r) << 16) | ((g) << 8) | (b))

inline UINT PosterizeColor(const UINT &color, const float &nColors)
{
    __m128 clr = _mm_cvtepi32_ps(  _mm_cvtepu8_epi32((__m128i&)color)  );

    clr = _mm_mul_ps(clr,  _mm_set_ps1(nColors / 255.0f)  );
    clr = _mm_round_ps(clr, _MM_FROUND_TO_NEAREST_INT);
    clr = _mm_mul_ps(clr, _mm_set_ps1(255.0f / nColors)  );

    __m128i iClr = _mm_cvttps_epi32(clr);

    return ARGB_COLOR(iClr.m128i_u8[12],
                      iClr.m128i_u8[8],
                      iClr.m128i_u8[4],
                      iClr.m128i_u8[0]);
}

in the first line, I unpack the color into 4 floats, but I can't find the proper way to do the reverse. 在第一行,我将颜色打包成4个浮点数,但我找不到正确的反向方法。

I searched through the SSE docs and could not find the reverse of _mm_cvtepu8_epi32 我搜索了SSE文档,找不到_mm_cvtepu8_epi32的反向

does one exist? 一个存在吗?

A combination of _mm_shuffle_epi8 and _mm_cvtsi128_si32 is what you need: 您需要_mm_shuffle_epi8_mm_cvtsi128_si32组合:

static const __m128i shuffleMask = _mm_setr_epi8(0,  4,  8, 12, -1, -1, -1, -1,
                                               -1, -1, -1, -1, -1, -1, -1, -1);
UINT color = _mm_cvtsi128_si32(_mm_shuffle_epi8(iClr, shuffleMask));

Unfortunately, there's no instruction to do that even in AVX (none that I'm aware of). 不幸的是,即使在AVX中也没有指令可以做到这一点(我没有意识到)。 So you will have to do it manually like are right now. 所以你必须像现在一样手动完成。

However, your current method is very sub-optimal and you're relying on .m128i_u8 which is an MSVC extension. 但是,您当前的方法非常不理想,并且您依赖的是.m128i_u8 ,它是MSVC扩展。 Based on my experience with MSVC, it will use an aligned buffer to access the individual elements. 根据我对MSVC的经验,它将使用对齐的缓冲区来访问各个元素。 This has a very heavy penalty because of partial-word access. 由于部分词语访问,这会受到非常严重的惩罚。

Instead of .m128i_u8 , use _mm_extract_epi32() . 而不是.m128i_u8 ,使用_mm_extract_epi32() This is in SSE4.1. 这是在SSE4.1中。 But you're already relying with SSE4.1 with _mm_cvtepu8_epi32() . 但是你已经使用_mm_cvtepu8_epi32()依赖SSE4.1了。

This situation is particularly bad since you're working with 1-byte granularity. 由于您使用的是1字节粒度,因此这种情况特别糟糕。 If you were working with 2-byte (16-bit integer) granularity instead, there is an efficient solution using shuffle intrinsics . 如果您使用的是2字节(16位整数)粒度,则可以使用shuffle内在函数进行有效的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM