简体   繁体   中英

Bit reversal for N bit word using c++ constexpr

I am working on a bit reversal algorithm for an fft implementation, my implementation so far is

//assume the proper includes

template<unsigned long bits>
unsigned long&& bitreverse(unsigned long value){
    std::bitset<bits> input(value);
    std::bitset<bits> result;
    unsigned long j=input.size()-1;
    for (unsigned long i=0; i<input.size(); ++i) {
        result[i]=input[j];
        j--;
    }
    return std::move(result.to_ulong());
}

I need to be able to reverse the bits in an N bit word. My current implementation is functional but I would like to re-write it so that the result can be used as a constexpr , the function signature would need to be either:

template<unsigned long bits>
constexpr unsigned long&& bitreverse(unsigned long value);

or:

template<unsigned long bits,unsigned long value>
constexpr unsigned long&& bitreverse();

or something close...

I'm not sure how to begin implementing this.

I would like to avoid bitwise operations if possible, but i'm not opposed to them.

Thanks

You could just do this:

template <unsigned long bits>
constexpr unsigned long bitreverse(unsigned long value) {
    unsigned long result = 0;
    for (std::size_t i = 0, j = bits - 1; i < bits; ++i, --j) {
        result |= ((value & (1 << j)) >> j) << i;
    }
    return result;
}

I'm not sure why you want to use an r-value reference for the return type. It's not going to make anything more efficient, and I think will result in a dangling reference .

Well, here's the obvious, "brute force" approach.

This assumes that unsigned long long datatype on the implementation is a 64 bit integer. The code can be obviously pruned down for 32-bit platforms.

Observe that bitreverse can always be initially handled as a 64 bit bitreverse , then shifted right to get the correct number of bits out of it.

This is a bit wordy, but it has the advantage that it's straightforward enough that most compilers can chew through a bitreverse of a constant at compile time. For a variable, this will certainly generate more code than the iterative approach, but modern CPUs will likely be able to plow through it without much delay, because they won't have to deal with looping and branch prediction.

template<unsigned long bits>
constexpr unsigned long long bitreverse(unsigned long long v)
{
    return (((v & 0x00000001ULL) << 63) |
        ((v & 0x00000002ULL) << 61) |
        ((v & 0x00000004ULL) << 59) |
        ((v & 0x00000008ULL) << 57) |
        ((v & 0x00000010ULL) << 55) |
        ((v & 0x00000020ULL) << 53) |
        ((v & 0x00000040ULL) << 51) |
        ((v & 0x00000080ULL) << 49) |
        ((v & 0x00000100ULL) << 47) |
        ((v & 0x00000200ULL) << 45) |
        ((v & 0x00000400ULL) << 43) |
        ((v & 0x00000800ULL) << 41) |
        ((v & 0x00001000ULL) << 39) |
        ((v & 0x00002000ULL) << 37) |
        ((v & 0x00004000ULL) << 35) |
        ((v & 0x00008000ULL) << 33) |
        ((v & 0x00010000ULL) << 31) |
        ((v & 0x00020000ULL) << 29) |
        ((v & 0x00040000ULL) << 27) |
        ((v & 0x00080000ULL) << 25) |
        ((v & 0x00100000ULL) << 23) |
        ((v & 0x00200000ULL) << 21) |
        ((v & 0x00400000ULL) << 19) |
        ((v & 0x00800000ULL) << 17) |
        ((v & 0x01000000ULL) << 15) |
        ((v & 0x02000000ULL) << 13) |
        ((v & 0x04000000ULL) << 11) |
        ((v & 0x08000000ULL) << 9) |
        ((v & 0x10000000ULL) << 7) |
        ((v & 0x20000000ULL) << 5) |
        ((v & 0x40000000ULL) << 3) |
        ((v & 0x80000000ULL) << 1) |
        ((v & 0x100000000ULL) >> 1) |
        ((v & 0x200000000ULL) >> 3) |
        ((v & 0x400000000ULL) >> 5) |
        ((v & 0x800000000ULL) >> 7) |
        ((v & 0x1000000000ULL) >> 9) |
        ((v & 0x2000000000ULL) >> 11) |
        ((v & 0x4000000000ULL) >> 13) |
        ((v & 0x8000000000ULL) >> 15) |
        ((v & 0x10000000000ULL) >> 17) |
        ((v & 0x20000000000ULL) >> 19) |
        ((v & 0x40000000000ULL) >> 21) |
        ((v & 0x80000000000ULL) >> 23) |
        ((v & 0x100000000000ULL) >> 25) |
        ((v & 0x200000000000ULL) >> 27) |
        ((v & 0x400000000000ULL) >> 29) |
        ((v & 0x800000000000ULL) >> 31) |
        ((v & 0x1000000000000ULL) >> 33) |
        ((v & 0x2000000000000ULL) >> 35) |
        ((v & 0x4000000000000ULL) >> 37) |
        ((v & 0x8000000000000ULL) >> 39) |
        ((v & 0x10000000000000ULL) >> 41) |
        ((v & 0x20000000000000ULL) >> 43) |
        ((v & 0x40000000000000ULL) >> 45) |
        ((v & 0x80000000000000ULL) >> 47) |
        ((v & 0x100000000000000ULL) >> 49) |
        ((v & 0x200000000000000ULL) >> 51) |
        ((v & 0x400000000000000ULL) >> 53) |
        ((v & 0x800000000000000ULL) >> 55) |
        ((v & 0x1000000000000000ULL) >> 57) |
        ((v & 0x2000000000000000ULL) >> 59) |
        ((v & 0x4000000000000000ULL) >> 61) |
        ((v & 0x8000000000000000ULL) >> 63)) >> (64 - bits);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM