Interpreting a std::vector<unsigned int> as bitvector - efficient algorithm?

Question

I would like to interpret

std::vector<unsigned int> numbers

as a bitvector, ie the MSB of numbers[0] is the 1st bit, the MSB of numbers[1] is the 33rd bit and so on. I want to find all sequences of Ones in this vector and store the corresponding positions in a data structure. (Also a single One is defined as sequence here)

For example: I have the values 15 and 112 stored in numbers. Thus bit 29 to 32 and bit 58 to 60 are equal to one. The challange is to optimize the runtime of this function.

Here is my idea of how to handle this: I thought of working with two for -loops. The first loop is iterating through the elements of "numbers" (let's call it element_loop), while the second loop is used to figure out the the positions of all Ones within a single element (let's call it bit_loop). I thought of detecting the "rising" and "falling edges" of a sequence for that purpose.

At the beginning of every bit_loop cycle, a mask is initialized to the hex. value 0x80000000 . With this mask I check whether the 1st bit is equal to one. If yes, the current position (0) is stored. Following, the mask in binary representation " 10 00..." is used to detect a "falling edge" in the next cycle. If no, the mask is shifted by one bit to the right " 01 00..." in order to detect a "rising edge" in the next cycle. (I only care about the the couple of bold numbers)

Once an edge is detected, I store the current position and shift the mask in the appropriate way by one bit. Therefore, after a pos. edge ( 01 ) I switch to neg. edge detection ( 10 ) and the other way round. While iterating through the 32 bit usigned number, I store all edge positions in some kind of vector. This vector could be a 2-dim. array, with the first column being the start of a one-sequence and the second column the end of the sequence. Furthermore I will need some special treatment for the turnover from one element to the next.

Here's my general question: What do you think of this approach? Is there a way to handle this more efficiently? Thanks a lot for your help in advance.

Ben

Answer 1

There are various bitwise tricks to do bit scans efficiently, but if you're using C++ you can take advantage of either std::bitset or boost::dynamic_bitset to iterate over bit positions. The algorithm you described iterates backwards for each block though, so you would want to invert your positions using something like 32 - (32 - i) .

Depending on the architecture, each bit should take roughly a cycle.

Answer 2

There are efficient (constant time) methods for finding the first bit set in a word, using either special processor instructions or various clever tricks (see eg Position of least significant bit that is set ).

With a bit of care you could work backwards and use those to scan for the first one, then do some masking and bit flipping and search for the next zero, and so on.

This might give you a faster algorithm, especially if the sequences are long on average so the gain on the fast scans outweighs the cost of the bit twiddling.

Interpreting a std::vector<unsigned int> as bitvector - efficient algorithm?

Question

2 answers

solution1
1 2015-10-16 17:57:39

solution2
0 ACCPTED 2015-10-16 18:16:23

Interpreting a std::vector<unsigned int> as bitvector - efficient algorithm?

Question

2 answers

solution1 1 2015-10-16 17:57:39

solution2 0 ACCPTED 2015-10-16 18:16:23

solution1
1 2015-10-16 17:57:39

solution2
0 ACCPTED 2015-10-16 18:16:23