Fast std::vector<bool> Reset

Question

I have a very large vector of bits (10s of millions of "presence" bits) whose size known only at run-time (ie no std::bitset ), but known before actual usage, so the container can be pre-allocated.

This vector is initially all zeros with single bits set incrementally, sparsely and randomly. My only use of this container is direct random access - checking "presence" (no STL). After trying several alternative containers it seems that std::vector<bool> is a good fit for my needs (despite its conceptual problems).

Every once in a while I need to reset all the bits in this vector.
Since it is so big, I cannot afford a full reset of all its elements. However, I know the indices of all the set bits so I can reset them individually.

Since std::vector<bool> represents bool s as bits, each such reset involves extra shifts and other such adjustment operations.

Is there a (portable) way to do a "rough", low-accuracy, reset?
One that will reset the whole integer that my requested bit belongs to, avoiding any extra additional operations ?

Answer 1

Method 1 "Dense" :

If you need a "dense" container I would suggest you use a mixture of vector and bitset .

We store bits as a sequence of bitsets, thus we can use bitset::reset on each "chunk" to reset them all. DynamicBitset::resize can be used to make room for the correct number of bits.

class DynamicBitset
{
private:
    static const unsigned BITS_LEN = 64; // or 32, or more?
    typedef std::bitset<BITS_LEN> Bits;
    std::vector<Bits> data_;
public:
    DynamicBitset(size_t len=0) {
        resize(len);
    }
    // reset all bits to false
    void reset() {
        for(auto it=data_.begin(); it!=data_.end(); ++it) {
            it->reset(); // we can use the fast bitfield::reset :)
        }
    }
    // reset the whole bitset belonging to bit i
    void reset_approx(size_t i) {
        data_[i/BITS_LEN].reset();
    }
    // make room for len bits
    void resize(size_t len) {
        data_.resize(len/BITS_LEN + 1);
    }
    // access a bit
    Bits::reference operator[](size_t i) {
        size_t k = i/BITS_LEN;
        return data_[k][i-k*BITS_LEN];
    }
    bool operator[](size_t i) const {
        size_t k = i/BITS_LEN;
        return data_[k][i-k*BITS_LEN];
    }
};

Method 2 "Sparse" :

If you store only very few bits, you can also go with a mixture of map and bitset .

Here we store chunks only if there is at least on bit set in it. This has additional costs for accessing bits as we need a lookup into a std::map which has O(log N) , but uses much less memory.

Additionally the function reset does exactly what you stated in your question - it only touches areas where you have set a bit.

SparseDynamicBitset is a good choice when you have very long sequences of bits which are always false, eg for 1000...000100...010, and not for 0101000111010110.

class SparseDynamicBitset
{
private:
    static const unsigned BITS_LEN = 64; // ?
    typedef std::bitset<BITS_LEN> Bits;
    std::map<unsigned,Bits> data_;
public:
    // reset all "stored" bits to false
    void reset() {
        for(auto it=data_.begin(); it!=data_.end(); ++it) {
            it->second.reset(); // uses bitfield::reset :)
        }
    }
    // access a bit
    Bits::reference operator[](size_t i) {
        size_t k = i/BITS_LEN;
        return data_[k][i-k*BITS_LEN];
    }
    bool operator[](size_t i) const {
        size_t k = i/BITS_LEN;
        auto it = data_.find(k);
        if(it == it.end()) {
            return false; // the default
        }
        return it->second[i-k*BITS_LEN];
    }
};

Answer 2

Is there a (portable) way to do a "rough", low-accuracy, reset? One that will reset the whole integer that my requested bit belongs to, avoiding any extra additional operations?

No, and no. (Not for that container type)

FWIW, I wrote a data type that may help you with large sparse bit sets (you'll need to wrap it in an outer type that creates an array of them etc.). The type is 32 bits wide and tracks the on/off status of 1024 bits by using the 32 bits to either (worst case = > 3 bits set) store a pointer to a vector<unsigned> , or (typically) using the 32-bits as a 2-bit size/count and up to 3 embedded 10-bit indices of set bits. When the vector 's not needed, this achieves a 32x storage density improvement over a std::bitset<> , which should help reduce memory cache misses.

NOTES: It's crucial that the size member - in_use - be aligned over the least significant bits in the vector* , such that any legitimate pointer (which will have at least 4-byte alignment) will necessarily have an in_use value of 0. Written for 32-bit apps where uintptr_t == 32 , but the same concept can be used to create a 64-bit version; the data type needs to use uintptr_t so it can store a pointer-to- vector when needed, so must match the 32- or 64-bit execution mode.

Code includes tests showing random operations affect bitset s and Bit_Packer_32 s identically.

#include <iostream>
#include <bitset>
#include <vector>
#include <algorithm>
#include <set>

class Bit_Packer_32
{
    // Invariants:
    // - 0 bits set: u_ == p_ == a == b == c == 0
    // - 1 bit set:  in_use == 1, a meaningful, b == c == 0
    // - 2 bits set: in_use == 2, a & b meaningful, c == 0
    // - 3 bits set: in_use == 3, a & b & c meaningful
    // - >3 bits:    in_use == 0, p_ != 0
    // NOTE: differentiating 0 from >3 depends on a == b == c == 0 for former

  public:
    Bit_Packer_32() : u_(0) { }

    class Reference
    {
      public:
        Reference(Bit_Packer_32& p, size_t n) : p_(p), n_(n) { }

        Reference& operator=(bool b) { p_.set(n_, b); return *this; }
        operator bool() const { return p_[n_]; }

      private:
        Bit_Packer_32& p_;
        size_t n_;
    };

    Reference operator[](size_t n) { return Reference(*this, n); }

    void set(size_t n)
    {
        switch (in_use)
        {
          case 0:
            if (p_)
            {
               if (std::find(p_->begin(), p_->end(), n) == p_->end())
                   p_->push_back(n);
            }
            else
            { in_use = 1; a = n; }
            break;
          case 1: if (a != n) { in_use = 2; b = n; } break;
          case 2: if (a != n && b != n) { in_use = 3; c = n; } break;
          case 3: if (a == n || b == n || c == n) break;
                  V* p = new V(4);
                  (*p)[0] = a; (*p)[1] = b; (*p)[2] = c; (*p)[3] = n;
                  p_ = p;
        }
    }

    void reset(size_t n)
    {
        switch (in_use)
        {
          case 0:
            if (p_)
            {
                V::iterator i = std::find(p_->begin(), p_->end(), n);
                if (i != p_->end())
                {
                    p_->erase(i);
                    if (p_->size() == 3)
                    {
                        // faster to copy out w/o erase, but tedious
                        int p0 = (*p_)[0], p1 = (*p_)[1], p2 = (*p_)[2];
                        delete p_;
                        a = p0; b = p1; c = p2;
                        in_use = 3;
                    }
                }
            }
            break;

          case 1: if (a == n) { u_ = 0; /* in_use = a = 0 */ } break;
          case 2: if (a == n) { in_use = 1; a = b; b = 0; break; }
                  else if (b == n) { in_use = 1; b = 0; break; }
          case 3:      if (a == n) a = c;
                  else if (b == n) b = c;
                  else if (c == n) ;
                  else break;
                  in_use = 2;
                  c = 0;
        }
    }

    void reset_all()
    {
        if (in_use == 0) delete p_;
        u_ = 0;
    }

    size_t count() const { return in_use ? in_use : p_ ? p_->size() : 0; }

    void set(size_t n, bool b) { if (b) set(n); else reset(n); }

    bool operator[](size_t n) const
    {
        switch (in_use)
        {
          case 0:
            return p_ && std::find(p_->begin(), p_->end(), n) != p_->end();
          case 1: return n == a;
          case 2: return n == a || n == b;
          case 3: return n == a || n == b || n == c;
        }
    }

    // e.g. operate_on<std::bitset<1024>, Op_Set>()
    //      operate_on<std::set<unsigned>, Op_Insert>()
    //      operate_on<std::vector<unsigned>, Op_Push_Back>()
    template <typename T, typename Op>
    T operate_on(const Op& op = Op()) const
    {
        T result;
        switch (in_use)
        {
          case 0:
            if (p_)
                for (V::const_iterator i = p_->begin(); i != p_->end(); ++i)
                     op(result, *i);
            break;
          case 3: op(result, c);
          case 2: op(result, b);
          case 1: op(result, a);
        }
        return result;
    }

  private:
    union
    {
        uintptr_t u_;
        typedef std::vector<unsigned> V;
        V* p_;
        struct
        {
            unsigned in_use : 2;
            unsigned a : 10;
            unsigned b : 10;
            unsigned c : 10;
        };
    };
};

struct Op_Insert
{
    template <typename T, typename U>
    void operator()(T& t, const U& u) const { t.insert(u); }
};

struct Op_Set
{
    template <typename T, typename U>
    void operator()(T& t, const U& u) const { t.set(u); }
};

struct Op_Push_Back
{
    template <typename T, typename U>
    void operator()(T& t, const U& u) const { t.push_back(u); }
};

#define TEST(A, B, MSG) \
    do { \
        bool pass = (A) == (B); \
        if (pass) break; \
        std::cout << "@" << __LINE__ << ": (" #A " == " #B ") "; \
        std::cout << (pass ? "pass\n" : "FAIL\n"); \
        std::cout << "  (" << (A) << " ==\n"; \
        std::cout << "   " << (B) << ")\n"; \
        std::cout << MSG << std::endl; \
    } while (false)

template <size_t N>
std::set<unsigned> to_set(const std::bitset<N>& bs)
{
    std::set<unsigned> result;
    for (unsigned i = 0; i < N; ++i)
        if (bs[i]) result.insert(i);
    return result;
}

template <typename T>
std::ostream& operator<<(std::ostream& os, const std::set<T>& s)
{
    for (std::set<T>::const_iterator i = s.begin(); i != s.end(); ++i)
    {
        if (i != s.begin()) os << ' ';
        os << *i;
    }
    return os;
}

int main()
{
    TEST(sizeof(uintptr_t), 4, "designed for 32-bit uintptr_t");

    for (int i = 0; i < 100000; ++i)
    {
        Bit_Packer_32 bp;
        std::bitset<1024> bs;
        for (int j = 0; j < 1 + i % 10; ++j)
        {
            int n = rand() % 1024;
            int v = rand() % 2;
            bs[n] = v;
            bp[n] = v;
            // TEST(bp.get_bitset(), bs);
            TEST((bp.operate_on<std::set<unsigned>, Op_Insert>()), to_set(bs),
                 "j " << j << ", n " << n << ", v " << v);
        }
    }
}

Answer 3

No, there is no portable way to do so, because there is no requirement to use bit fields, only a recommendation. If you want to be portable, you might want to implement your own, based for example on std::vector<uint8_t> or a std::bitset .

Fast std::vector<bool> Reset

Question

3 answers

solution1
3 ACCPTED 2014-04-07 11:14:08

solution2
1 2014-04-07 11:11:18

solution3
1 2014-04-07 11:16:57

Fast std::vector<bool> Reset

Question

3 answers

solution1 3 ACCPTED 2014-04-07 11:14:08

solution2 1 2014-04-07 11:11:18

solution3 1 2014-04-07 11:16:57

solution1
3 ACCPTED 2014-04-07 11:14:08

solution2
1 2014-04-07 11:11:18

solution3
1 2014-04-07 11:16:57