How to convert from 32-bit to 16-bit unsigned integers in AVX2?

Question

I use _mm256_cvtps_epi32() to convert from 8 float s to 8x32-bit integers. But the goal is to get to 16-bit unsigned integers. I have 2 vectors a0 and a1 , each of __m256i type. What is the fastest way to pack them so that 16-bit equivalents of a0 get into the lower 128 bits of the result, and equivalents of a1 get into the higher 128 bits?

Here's what I've got so far, where p0 and p1 are two __m256 vectors of 8 float s each:

const __m256i vShuffle = _mm256_setr_epi8(
  0, 1, 4, 5, 8, 9, 12, 13, -1, -1, -1, -1, -1, -1, -1, -1,
  -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 4, 5, 8, 9, 12, 13);
const __m256i a0 = _mm256_cvtps_epi32(p0);
const __m256i a1 = _mm256_cvtps_epi32(p1);
const __m256i b0 = _mm256_shuffle_epi8(a0, vShuffle);
const __m256i b1 = _mm256_shuffle_epi8(a1, vShuffle);
const __m128i c0 = _mm_or_si128(_mm256_extracti128_si256(b0, 0), _mm256_extracti128_si256(b0, 1));
const __m128i c1 = _mm_or_si128(_mm256_extracti128_si256(b1, 0), _mm256_extracti128_si256(b1, 1));
return _mm256_setr_m128i(c0, c1);

Answer 1

I didn't test that code but it should do the trick for you:

__m256i tmp1 = _mm256_cvtps_epi32(p0);
__m256i tmp2 = _mm256_cvtps_epi32(p1);
tmp1 = _mm256_packus_epi32(tmp1, tmp2);
tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8);
// _mm256_store_si256 this

How to convert from 32-bit to 16-bit unsigned integers in AVX2?

Question

1 answers

solution1
2 ACCPTED 2019-03-07 10:12:21

How to convert from 32-bit to 16-bit unsigned integers in AVX2?

Question

1 answers

solution1 2 ACCPTED 2019-03-07 10:12:21

solution1
2 ACCPTED 2019-03-07 10:12:21