简体   繁体   中英

How to convert from 32-bit to 16-bit unsigned integers in AVX2?

I use _mm256_cvtps_epi32() to convert from 8 float s to 8x32-bit integers. But the goal is to get to 16-bit unsigned integers. I have 2 vectors a0 and a1 , each of __m256i type. What is the fastest way to pack them so that 16-bit equivalents of a0 get into the lower 128 bits of the result, and equivalents of a1 get into the higher 128 bits?

Here's what I've got so far, where p0 and p1 are two __m256 vectors of 8 float s each:

const __m256i vShuffle = _mm256_setr_epi8(
  0, 1, 4, 5, 8, 9, 12, 13, -1, -1, -1, -1, -1, -1, -1, -1,
  -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 4, 5, 8, 9, 12, 13);
const __m256i a0 = _mm256_cvtps_epi32(p0);
const __m256i a1 = _mm256_cvtps_epi32(p1);
const __m256i b0 = _mm256_shuffle_epi8(a0, vShuffle);
const __m256i b1 = _mm256_shuffle_epi8(a1, vShuffle);
const __m128i c0 = _mm_or_si128(_mm256_extracti128_si256(b0, 0), _mm256_extracti128_si256(b0, 1));
const __m128i c1 = _mm_or_si128(_mm256_extracti128_si256(b1, 0), _mm256_extracti128_si256(b1, 1));
return _mm256_setr_m128i(c0, c1);

I didn't test that code but it should do the trick for you:

__m256i tmp1 = _mm256_cvtps_epi32(p0);
__m256i tmp2 = _mm256_cvtps_epi32(p1);
tmp1 = _mm256_packus_epi32(tmp1, tmp2);
tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8);
// _mm256_store_si256 this

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM