簡體   English   中英

使用內在函數對128、256、512位注冊表進行全局按位移位?

[英]Global bitwise shift of 128, 256, 512 bit registry using intrinsics?

考慮一個由64位無符號整數組成的數組,例如:

std::array<unsigned long long int, 20> a;

最快的方法是什么,包括使用intel或編譯器內部函數( thisthat )(使用g ++ 5.3)來執行全局位移(向右或向左),因為此數組是單個位整數?

您可能想看看std::bitset ,它是編譯時已知的許多位的容器。 如果我對您的問題理解正確,那就是您要嘗試使用數組模擬的內容。 bitset類包含重載的>><<運算符以執行移位,並且可以在您的編譯器/標准庫組合中優化這些實現。

以下是一些x86左移函數,這些函數通過內部函數使用xmm和ymm寄存器。 進行相應的右移功能應該不難。 它們取自軟件lfsr基准測試

//----------------------------------------------------------------------------
// bit shift left a 128-bit value using xmm registers
//          __m128i *data - data to shift
//          int count     - number of bits to shift
// return:  __m128i       - carry out bit(s)

static __m128i bitShiftLeft128xmm (__m128i *data, int count)
   {
   __m128i innerCarry, carryOut;

   innerCarry = _mm_srli_epi64 (*data, 64 - count);      // carry outs in bit 0 of each qword
   carryOut   = _mm_shuffle_epi32 (innerCarry, 0xFE);    // upper carry in xmm bit 0, others zero
   innerCarry = _mm_shuffle_epi32 (innerCarry, 0xCF);    // lower carry in xmm bit 64, others zero
   *data = _mm_slli_epi64 (*data, count);                // shift all qwords left
   *data = _mm_or_si128 (*data, innerCarry);             // propagate carry out from low qword
   return carryOut;
   }

//----------------------------------------------------------------------------
// bit shift left a 256-bit value using xmm registers
//          __m128i *data - data to shift, ls part stored first 
//          int count     - number of bits to shift
// return:  __m128i       - carry out bit(s)

static __m128i bitShiftLeft256xmm (__m128i *data, int count)
   {
   __m128i carryOut0, carryOut1;

   carryOut0 = bitShiftLeft128xmm (&data [0], count);
   carryOut1 = bitShiftLeft128xmm (&data [1], count);
   data [1] = _mm_or_si128 (data [1], carryOut0);
   return carryOut1;
   }

//----------------------------------------------------------------------------
// bit shift left a 512-bit value using xmm registers
//          __m128i *data - data to shift, ls part stored first 
//          int count     - number of bits to shift
// return:  __m128i       - carry out bit(s)

static __m128i bitShiftLeft512xmm (__m128i *data, int count)
   {
   __m128i carryOut0, carryOut1;

   carryOut0 = bitShiftLeft256xmm (&data [0], count);
   carryOut1 = bitShiftLeft256xmm (&data [2], count);
   data [2] = _mm_or_si128 (data [2], carryOut0);
   return carryOut1;
   }


//----------------------------------------------------------------------------
// bit shift left a 256-bit value using ymm registers
//          __m256i *data - data to shift
//          int count     - number of bits to shift
// return:  __m256i       - carry out bit(s)

static __m256i bitShiftLeft256ymm (__m256i *data, int count)
   {
   __m256i innerCarry, carryOut, rotate;

   innerCarry = _mm256_srli_epi64 (*data, 64 - count);                        // carry outs in bit 0 of each qword
   rotate     = _mm256_permute4x64_epi64 (innerCarry, 0x93);                  // rotate ymm left 64 bits
   innerCarry = _mm256_blend_epi32 (_mm256_setzero_si256 (), rotate, 0xFC);   // clear lower qword
   *data    = _mm256_slli_epi64 (*data, count);                               // shift all qwords left
   *data    = _mm256_or_si256 (*data, innerCarry);                            // propagate carrys from low qwords
   carryOut   = _mm256_xor_si256 (innerCarry, rotate);                        // clear all except lower qword
   return carryOut;
   }

//----------------------------------------------------------------------------
// bit shift left a 512-bit value using ymm registers
//          __m256i *data - data to shift, ls part stored first 
//          int count     - number of bits to shift
// return:  __m256i       - carry out bit(s)

static __m256i bitShiftLeft512ymm (__m256i *data, int count)
   {
   __m256i carryOut0, carryOut1;

   carryOut0 = bitShiftLeft256ymm (&data [0], count);
   carryOut1 = bitShiftLeft256ymm (&data [1], count);
   data [1] = _mm256_or_si256 (data [1], carryOut0);
   return carryOut1;
   }

//----------------------------------------------------------------------------

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM