简体   繁体   English

使用内在函数对128、256、512位注册表进行全局按位移位?

[英]Global bitwise shift of 128, 256, 512 bit registry using intrinsics?

Consider an array of 64 bit unsigned integers, like: 考虑一个由64位无符号整数组成的数组,例如:

std::array<unsigned long long int, 20> a;

What is the fastest way, including using intel or compiler intrinsics ( this or that ) (using g++ 5.3), to perform a global bitshift (right or left) as this array was a single bit integer? 最快的方法是什么,包括使用intel或编译器内部函数( thisthat )(使用g ++ 5.3)来执行全局位移(向右或向左),因为此数组是单个位整数?

You might want to look at std::bitset , which is a container for a number of bits known at compile time. 您可能想看看std::bitset ,它是编译时已知的许多位的容器。 If I'm understanding your question right, that is what you are trying to simulate with your array. 如果我对您的问题理解正确,那就是您要尝试使用数组模拟的内容。 The bitset class includes overloaded >> and << operators to perform bit shifts, and those implementations may be optimised in your compiler/standard library combination. bitset类包含重载的>><<运算符以执行移位,并且可以在您的编译器/标准库组合中优化这些实现。

Here are some x86 left shift functions that use xmm and ymm registers through intrinsics. 以下是一些x86左移函数,这些函数通过内部函数使用xmm和ymm寄存器。 It shouldn't be too hard to make corresponding right shift functions. 进行相应的右移功能应该不难。 They are taken from a software lfsr benchmark : 它们取自软件lfsr基准测试

//----------------------------------------------------------------------------
// bit shift left a 128-bit value using xmm registers
//          __m128i *data - data to shift
//          int count     - number of bits to shift
// return:  __m128i       - carry out bit(s)

static __m128i bitShiftLeft128xmm (__m128i *data, int count)
   {
   __m128i innerCarry, carryOut;

   innerCarry = _mm_srli_epi64 (*data, 64 - count);      // carry outs in bit 0 of each qword
   carryOut   = _mm_shuffle_epi32 (innerCarry, 0xFE);    // upper carry in xmm bit 0, others zero
   innerCarry = _mm_shuffle_epi32 (innerCarry, 0xCF);    // lower carry in xmm bit 64, others zero
   *data = _mm_slli_epi64 (*data, count);                // shift all qwords left
   *data = _mm_or_si128 (*data, innerCarry);             // propagate carry out from low qword
   return carryOut;
   }

//----------------------------------------------------------------------------
// bit shift left a 256-bit value using xmm registers
//          __m128i *data - data to shift, ls part stored first 
//          int count     - number of bits to shift
// return:  __m128i       - carry out bit(s)

static __m128i bitShiftLeft256xmm (__m128i *data, int count)
   {
   __m128i carryOut0, carryOut1;

   carryOut0 = bitShiftLeft128xmm (&data [0], count);
   carryOut1 = bitShiftLeft128xmm (&data [1], count);
   data [1] = _mm_or_si128 (data [1], carryOut0);
   return carryOut1;
   }

//----------------------------------------------------------------------------
// bit shift left a 512-bit value using xmm registers
//          __m128i *data - data to shift, ls part stored first 
//          int count     - number of bits to shift
// return:  __m128i       - carry out bit(s)

static __m128i bitShiftLeft512xmm (__m128i *data, int count)
   {
   __m128i carryOut0, carryOut1;

   carryOut0 = bitShiftLeft256xmm (&data [0], count);
   carryOut1 = bitShiftLeft256xmm (&data [2], count);
   data [2] = _mm_or_si128 (data [2], carryOut0);
   return carryOut1;
   }


//----------------------------------------------------------------------------
// bit shift left a 256-bit value using ymm registers
//          __m256i *data - data to shift
//          int count     - number of bits to shift
// return:  __m256i       - carry out bit(s)

static __m256i bitShiftLeft256ymm (__m256i *data, int count)
   {
   __m256i innerCarry, carryOut, rotate;

   innerCarry = _mm256_srli_epi64 (*data, 64 - count);                        // carry outs in bit 0 of each qword
   rotate     = _mm256_permute4x64_epi64 (innerCarry, 0x93);                  // rotate ymm left 64 bits
   innerCarry = _mm256_blend_epi32 (_mm256_setzero_si256 (), rotate, 0xFC);   // clear lower qword
   *data    = _mm256_slli_epi64 (*data, count);                               // shift all qwords left
   *data    = _mm256_or_si256 (*data, innerCarry);                            // propagate carrys from low qwords
   carryOut   = _mm256_xor_si256 (innerCarry, rotate);                        // clear all except lower qword
   return carryOut;
   }

//----------------------------------------------------------------------------
// bit shift left a 512-bit value using ymm registers
//          __m256i *data - data to shift, ls part stored first 
//          int count     - number of bits to shift
// return:  __m256i       - carry out bit(s)

static __m256i bitShiftLeft512ymm (__m256i *data, int count)
   {
   __m256i carryOut0, carryOut1;

   carryOut0 = bitShiftLeft256ymm (&data [0], count);
   carryOut1 = bitShiftLeft256ymm (&data [1], count);
   data [1] = _mm256_or_si256 (data [1], carryOut0);
   return carryOut1;
   }

//----------------------------------------------------------------------------

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM