简体   繁体   中英

Run-time bit copy (bit-masking) in C++

I have a problem at hand and solved it one way but I am not happy how I solved it as it doesn't work in every context. The solution has to be in C++(11).

I have a char array and an int. Given an bit-offset relative to data and a length (in bits). I want to extract the bits from offset to offset+length from the array and store them in out.

char8_t data[8];
int32_t out;
int32_t offset;
int32_t length;

Figure with offset=24; length=4; offset=24; length=4;

Both the offset and the length are only available at run-time. Hence, I would like to avoid creating bitmasks. I personally solved it by casting the complete array to int64_t and then right-shift by (64-offset-length) and left-shift by (64-length).

out = (*(int64_t*)data) >> (64-offset-length) << (64-length);

The issue: If my array would be longer, there would be no primitive to capture the complete array. My solution wouldn't work anymore. Is there a better (scaling) way to do this?

In an ideal world I could create a pointer with a bit offset, but this is C++ not an ideal world.

Alternatives I thought about: Adding up bits with += on "out" by iterating through the array and left-shifting. Quite unelegant!

I am aware that there are similar questions out there, but either they have been poorly answered or the answers have hefty performance implications.

First, your approach will depend on endianess, ie on whether the system stores the most significant bytes at the begin or at the end of the respective 8-byte memory block.

Second, I'd use unsigned data types, eg uchar8_t data[8] and uint32_t in order to correctly deal with bit shifts and (automatic) type promotions.

If you exactly know where in your data[8] a specific information is stored and in which order, you could write it as follows:

uint32_t out = data[0] + 256*data[1]; 
...

Thereby, your "decoder" will be tightened to the order / meaning of the original data; your data may get longer than the largest integral data type; and you avoid undefined behaviour that might be introduced by shifting signed integral values over signed bits.

If your offset is not a multiple of 8, ie the "value" does not start at the beginning of a byte, you can still use bit shift operations to correct this. Let's assume that the value starts at an offset of 2 bits; Then you could write:

uint32_t out = (data[0] >> 2) + (data[1] << 6) + (data[2] << (6+8))

But - in the end - your target will be limited to a specific amount of bits, since the C language at your specific platform will guarantee a particular size for each of the primitive data types, unsigned long long probably being 64 bits still. This limit is implementation defined, the standard guarantees a minimum of bits for each data type. Whether this limit comes from registers or something else, you cannot know - its implementation defined.

Have you tried std::vector<bool> ? It is a specialization of std::vector that combines the dynamic size of vector with compactness of std::bitset .

I'd use bitset as a temporary. First copy byte aligned in a loop and then perform bit alignment.

unsigned startbit = offset;
unsigned startbyte = startbit / 8;
unsigned endbit = offset + length - 1;
unsigned endbyte = endbit / 8;

bitset<8*(sizeof(out) + 1)> align(0);
for(unsigned byte = endbyte; byte >= startbyte; --byte) { // byte align copy
// for(unsigned byte = startbyte; byte <= endbyte; ++byte) { // check endianess
    align <<= 8;
    align |= data[byte];
}
align >>= startbit % 8; // bit align
align &= ((1 << length) - 1); // mask

out = align.to_ullong();

I use std::bitset and boost::dynamic_bitset to represent binary data and manipulate them. std::bitset works well if the length is fixed, otherwise boost::dynamic_bitset is a good choice. With this, you can extract bits with overloaded bit operators:

#include <boost/dynamic_bitset.hpp>

using boost::dynamic_bitset;

dynamic_bitset<unsigned char> extract(unsigned char* first, unsigned char* last, int offset, int length) {
   dynamic_bitset<unsigned char> bits(first, last);

   bits >>= bits.size() - (offset  + length);
   bits.resize(length);

   return bits;
}

So instead of int32_t out; you can use dynamic_bitset<> to hold values of arbitrary bit length in an efficient way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM