简体   繁体   中英

mirror bits of a 32 bit word

How would you do that in C? (Example: 10110001 becomes 10001101 if we had to mirror 8 bits). Are there any instructions on certain processors that would simplify this task?

It's actually called "bit reversal", and is commonly done in FFT scrambling. The O(log N) way is (for up to 32 bits):

uint32_t reverse(uint32_t x, int bits)
{
    x = ((x & 0x55555555) << 1) | ((x & 0xAAAAAAAA) >> 1); // Swap _<>_
    x = ((x & 0x33333333) << 2) | ((x & 0xCCCCCCCC) >> 2); // Swap __<>__
    x = ((x & 0x0F0F0F0F) << 4) | ((x & 0xF0F0F0F0) >> 4); // Swap ____<>____
    x = ((x & 0x00FF00FF) << 8) | ((x & 0xFF00FF00) >> 8); // Swap ...
    x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16); // Swap ...
    return x >> (32 - bits);
}

Maybe this small "visualization" helps:
An example of the first 3 assignment, with a uint8_t example:

b7 b6 b5 b4  b3 b2 b1 b0
-> <- -> <-  -> <- -> <-
----> <----  ----> <----
---------->  <----------

Well, if we're doing ASCII art, here's mine:

7 6 5 4 3 2 1 0
 X   X   X   X 
6 7 4 5 2 3 0 1
 \ X /   \ X /
  X X     X X
 / X \   / X \
4 5 6 7 0 1 2 3
 \ \ \ X / / /
  \ \ X X / /
   \ X X X /
    X X X X
   / X X X \
  / / X X \ \
 / / / X \ \ \
0 1 2 3 4 5 6 7

It kind of looks like FFT butterflies. Which is why it pops up with FFTs.

Per Rich Schroeppel in this MIT memo (if you can read past the assembler), the following will reverse the bits in an 8bit byte providing that you have 64bit arithmetic available:

byte = (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023;

Which sort of fans the bits out (the multiply), selects them (the and) and then shrinks them back down (the modulus).

Is it actually an 8bit quantity that you have?

Nearly a duplicate of Most Efficient Algorithm for Bit Reversal ( from MSB->LSB to LSB->MSB) in C (which has a lot of answers, including one AVX2 answer for reversing every 8-bit char in an array).


X86

On x86 with SSSE3 (Core2 and later, Bulldozer and later), pshufb ( _mm_shuffle_epi8 ) can be used as a nibble LUT to do 16 lookups in parallel. You only need 8 lookups for the 8 nibbles in a single 32-bit integer, but the real problem is splitting the input bytes into separate nibbles (with their upper half zeroed). It's basically the same problem as for pshufb -based popcount.

avx2 register bits reverse shows how to do this for a packed vector of 32-bit elements. The same code ported to 128-bit vectors would compile just fine with AVX.

It's still good for a single 32-bit int because x86 has very efficient round-trip between integer and vector regs: int bitrev = _mm_cvtsi128_si32 ( rbit32( _mm_cvtsi32_si128(input) ) ); . That only costs 2 extra movd instructions to get an integer from an integer register into XMM and back. (Round trip latency = 3 cycles on an Intel CPU like Haswell.)


ARM:

rbit has single-cycle latency, and does a whole 32-bit integer in one instruction.

The naive / slow / simple way is to extract the low bit of the input and shift it into another variable that accumulates a return value.

#include <stdint.h>

uint32_t mirror_u32(uint32_t input) {
    uint32_t returnval = 0;
    for (int i = 0; i < 32; ++i) {
        int bit = input & 0x01;
        returnval <<= 1;
        returnval += bit;    // Shift the isolated bit into returnval
        input >>= 1;
    }
    return returnval;
}

For other types, the number of bits of storage is sizeof(input) * CHAR_BIT , but that includes potential padding bits that aren't part of the value. The fixed-width types are a good idea here.

The += instead of |= makes gcc compile it more efficiently for x86 (using x86's shift-and-add instruction, LEA). Of course, there are much faster ways to bit-reverse; see the other answers. This loop is good for small code size (no large masks), but otherwise pretty much no advantage.

Compilers unfortunately don't recognize this loop as a bit-reverse and optimize it to ARM rbit or whatever. (See it on the Godbolt compiler explorer )

Fastest approach is almost sure to be a lookup table:

out[0]=lut[in[3]];
out[1]=lut[in[2]];
out[2]=lut[in[1]];
out[3]=lut[in[0]];

Or if you can afford 128k of table data (by afford, I mean cpu cache utilization, not main memory or virtual memory utilization), use 16-bit units:

out[0]=lut[in[1]];
out[1]=lut[in[0]];

If you are interested in a more embedded approach , when I worked with an armv7a system, I found the RBIT command.

So within a C function using GNU extended asm I could use:

uint32_t bit_reverse32(uint32_t inp32)
{
    uint32_t out = 0;
    asm("RBIT %0, %1" : "=r" (out) : "r" (inp32));
    return out;
}

There are compilers which expose intrinsic C wrappers like this. ( armcc __rbit ) and gcc also has some intrinsic via ACLE but with gcc-arm-linux-gnueabihf I could not find __rbit C so I came up with the upper code.

I didn't look, but I suppose on other platforms you could create similar solutions.

我还刚刚想出了一个在仅 16 位临时空间中镜像 4 位(半字节)的最小解决方案。

mirr = ( (orig * 0x222) & 0x1284 ) % 63

I think I would make a lookup table of bitpatterns 0-255. Read each byte and with the lookup table reverse that byte and afterwards arrange the resulting bytes appropriately.

quint64 mirror(quint64 a,quint8 l=64) {
    quint64 b=0;
    for(quint8 i=0;i&lt;l;i++) {
        b|=(a>>(l-i-1))&((quint64)1<<i);
    }
return b;
}

This function mirroring less then 64 bits. For instance it can mirroring 12 bits.

quint64 and quint8 are defined in Qt. But it possible redefine it in anyway.

If you have been staring at Mike DeSimone's great answer (like me), here is a "visualization" on the first 3 assignment, with a uint8_t example:

b7 b6 b5 b4  b3 b2 b1 b0
-> <- -> <-  <- -> <- ->
----> <----  ----> <----
---------->  <----------

So first, bitwise swap, then "two-bit-group" swap and so on.

For sure most people won't consider my approach neither as elegant nor efficient: it's aimed at being portable and somehow "straightforward".

#include <limits.h> // CHAR_BIT

unsigned bit_reverse( unsigned s ) {
  unsigned d;
  int i;
  for( i=CHAR_BIT*sizeof( unsigned ),d=0; i; s>>=1,i-- ) {
    d <<= 1;
    d |= s&1;
  }
  return d;
}

This function pulls the least significant bit from the source bistring s and pushes it as the most significant bit in the destination bitstring d .

You can replace unsigned data type with whatever suits your case, from unsigned char ( CHAR_BIT bits, usually 8) to unsigned long long (128 bits in modern 64-bit CPUs).

Of course, there can be CPU-specific instructions (or instruction sets) that could be used instead of my plain C code.

But than that wouldn't be "C language" but rather assembly instruction(s) in a C wrapper.

int mirror (int input)
{// return bit mirror of 8 digit number 
  int tmp2;
  int out=0;
  for (int i=0; i<8; i++)
    {
      out = out << 1;
      tmp2 = input & 0x01;
      out = out | tmp2;
      input = input >> 1;        
    }
   return out;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM