Run-time bit copy (bit-masking) in C++

Question

I have a problem at hand and solved it one way but I am not happy how I solved it as it doesn't work in every context. The solution has to be in C++(11).

I have a char array and an int. Given an bit-offset relative to data and a length (in bits). I want to extract the bits from offset to offset+length from the array and store them in out.

char8_t data[8];
int32_t out;
int32_t offset;
int32_t length;

Figure with offset=24; length=4; offset=24; length=4;

Both the offset and the length are only available at run-time. Hence, I would like to avoid creating bitmasks. I personally solved it by casting the complete array to int64_t and then right-shift by (64-offset-length) and left-shift by (64-length).

out = (*(int64_t*)data) >> (64-offset-length) << (64-length);

The issue: If my array would be longer, there would be no primitive to capture the complete array. My solution wouldn't work anymore. Is there a better (scaling) way to do this?

In an ideal world I could create a pointer with a bit offset, but this is C++ not an ideal world.

Alternatives I thought about: Adding up bits with += on "out" by iterating through the array and left-shifting. Quite unelegant!

I am aware that there are similar questions out there, but either they have been poorly answered or the answers have hefty performance implications.

Answer 1

First, your approach will depend on endianess, ie on whether the system stores the most significant bytes at the begin or at the end of the respective 8-byte memory block.

Second, I'd use unsigned data types, eg uchar8_t data[8] and uint32_t in order to correctly deal with bit shifts and (automatic) type promotions.

If you exactly know where in your data[8] a specific information is stored and in which order, you could write it as follows:

uint32_t out = data[0] + 256*data[1]; 
...

Thereby, your "decoder" will be tightened to the order / meaning of the original data; your data may get longer than the largest integral data type; and you avoid undefined behaviour that might be introduced by shifting signed integral values over signed bits.

If your offset is not a multiple of 8, ie the "value" does not start at the beginning of a byte, you can still use bit shift operations to correct this. Let's assume that the value starts at an offset of 2 bits; Then you could write:

uint32_t out = (data[0] >> 2) + (data[1] << 6) + (data[2] << (6+8))

But - in the end - your target will be limited to a specific amount of bits, since the C language at your specific platform will guarantee a particular size for each of the primitive data types, unsigned long long probably being 64 bits still. This limit is implementation defined, the standard guarantees a minimum of bits for each data type. Whether this limit comes from registers or something else, you cannot know - its implementation defined.

Answer 2

Have you tried std::vector<bool> ? It is a specialization of std::vector that combines the dynamic size of vector with compactness of std::bitset .

Answer 3

I'd use bitset as a temporary. First copy byte aligned in a loop and then perform bit alignment.

unsigned startbit = offset;
unsigned startbyte = startbit / 8;
unsigned endbit = offset + length - 1;
unsigned endbyte = endbit / 8;

bitset<8*(sizeof(out) + 1)> align(0);
for(unsigned byte = endbyte; byte >= startbyte; --byte) { // byte align copy
// for(unsigned byte = startbyte; byte <= endbyte; ++byte) { // check endianess
    align <<= 8;
    align |= data[byte];
}
align >>= startbit % 8; // bit align
align &= ((1 << length) - 1); // mask

out = align.to_ullong();

Answer 4

I use std::bitset and boost::dynamic_bitset to represent binary data and manipulate them. std::bitset works well if the length is fixed, otherwise boost::dynamic_bitset is a good choice. With this, you can extract bits with overloaded bit operators:

#include <boost/dynamic_bitset.hpp>

using boost::dynamic_bitset;

dynamic_bitset<unsigned char> extract(unsigned char* first, unsigned char* last, int offset, int length) {
   dynamic_bitset<unsigned char> bits(first, last);

   bits >>= bits.size() - (offset  + length);
   bits.resize(length);

   return bits;
}

So instead of int32_t out; you can use dynamic_bitset<> to hold values of arbitrary bit length in an efficient way.

Run-time bit copy (bit-masking) in C++

Question

4 answers

solution1
1 ACCPTED 2018-04-10 09:44:02

solution2
1 2018-04-10 10:01:26

solution3
1 2018-04-10 11:16:25

solution4
0 2018-04-10 12:06:05

Run-time bit copy (bit-masking) in C++

Question

4 answers

solution1 1 ACCPTED 2018-04-10 09:44:02

solution2 1 2018-04-10 10:01:26

solution3 1 2018-04-10 11:16:25

solution4 0 2018-04-10 12:06:05

solution1
1 ACCPTED 2018-04-10 09:44:02

solution2
1 2018-04-10 10:01:26

solution3
1 2018-04-10 11:16:25

solution4
0 2018-04-10 12:06:05