简体   繁体   中英

High Order Bits - Take them and make a uint64_t into a uint8_t

Let's say you have a uint64_t and care only about the high order bit for each byte in your uint64_t. Like so:

uint32_t: 0000 ... 1000 0000 1000 0000 1000 0000 1000 0000 ---> 0000 1111

Is there a faster way than:

   return
   (
     ((x >> 56) & 128)+
     ((x >> 49) &  64)+
     ((x >> 42) &  32)+
     ((x >> 35) &  16)+
     ((x >> 28) &   8)+
     ((x >> 21) &   4)+
     ((x >> 14) &   2)+
     ((x >>  7) &   1)
   )

Aka shifting x, masking, and adding the correct bit for each byte? This will compile to a lot of assembly and I'm looking for a quicker way... The machine I'm using only has up to SSE2 instructions and I failed to find helpful SIMD ops.

Thanks for the help.

As I mentioned in a comment, pmovmskb does what you want. Here's how you could use it:

MMX + SSE1:

movq mm0, input ; input can be r/m
pmovmskb output, mm0 ; output must be r

SSE2:

movq xmm0, input
pmovmskb output, xmm0

And I looked up the new way

BMI2:

mov rax, 0x8080808080808080
pext output, input, rax ; input must be r
return ((x & 0x8080808080808080) * 0x2040810204081) >> 56;

works. The & selects the bits you want to keep. The multiplications all the bits into the most significant byte, and the shift moves them to the least significant byte. Since multiplication is fast on most modern CPUs this shouldn't be much slower than using assembly.

And here's how to do it using SSE intrinsics:

#include <xmmintrin.h>
#include <inttypes.h>
#include <stdio.h>

int main (void)
{
  uint64_t x
  = 0b0000000010000000000000001000000000000000100000000000000010000000;

  printf ("%x\n", _mm_movemask_pi8 ((__m64) x));
  return 0;
}

Works fine with:

gcc -msse

You don't need all the separate logical ANDs, you can simplify it to:

x &= 0x8080808080808080;
return (x >>  7) | (x >> 14) | (x >> 21) | (x >> 28) |
       (x >> 35) | (x >> 42) | (x >> 49) | (x >> 56);

(assuming that the function return type is uint8_t ).

You can also convert that to an unrolled loop:

uint8_t r = 0;

x &= 0x8080808080808080;

x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
return r;

I'm not sure which will perform better in practice, though I'd tend to bet on the first - the second might produce shorter code but with a long dependency chain.

First you don't really need so many operations. You can act on more than one bit at a time:

x = (x >> 7) & 0x0101010101010101; // 0x0101010101010101
x |= x >> 28;                      // 0x????????11111111
x |= x >> 14;                      // 0x????????????5555
x |= x >>  7;                      // 0x??????????????FF
return x & 0xFF;

An alternative is to use modulo to do sideway additions. The first thing is to note that x % n is the sum of the digits in base n+1 , so if n+1 is 2^k , you are adding groups of k bits. If you start with t = (x >> 7) & 0x0101010101010101 like above, you want to sum groups of 7 bits, thus t % 127 would be the solution. But t%127 works only for result up to 126. 0x8080808080808080 and anything above will gives incorrect result. I've tried some corrections, none where easy.

Trying to use modulo to put us in the situation where there is just the last step of the previous algorithm to was possible. What we want is to keep the two less significant bits, and then have the sum of the other one, grouped by 14. So

ull t = (x & 0x8080808080808080) >> 7;
ull u = (t & 3) | (((t>>2) % 0x3FFF) << 2);
return (u | (u>>7)) & 0xFF;

But t>>2 is t/4 and << 2 is multiplying by 4. And if we have (a % b)*c == (a*c % b*c) , thus (((t>>2) % 0x3FFF) << 2) is (t & ~3) % 0xFFFC . But we also have the fact that a + b%c = (a+b)%c if it is less than c. So we have simply u = t % FFFC . Giving:

ull t = ((x & 0x8080808080808080) >> 7) % 0xFFFC;
return (t | (t>>7)) & 0xFF;

这似乎有效:

return (x & 0x8080808080808080) % 127;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM