简体   繁体   English

快速std :: vector <bool> 重启

[英]Fast std::vector<bool> Reset

I have a very large vector of bits (10s of millions of "presence" bits) whose size known only at run-time (ie no std::bitset ), but known before actual usage, so the container can be pre-allocated. 我有它的大小只有在运行时已知的(即无位(以百万计的“存在”位10S)一个非常大的向量std::bitset ),但实际使用之前已知的,因此容器可预先分配。

This vector is initially all zeros with single bits set incrementally, sparsely and randomly. 此向量最初是全零,并且以单个,增量,稀疏和随机的方式设置。 My only use of this container is direct random access - checking "presence" (no STL). 我对这个容器的唯一使用是直接随机访问-检查“状态”(没有STL)。 After trying several alternative containers it seems that std::vector<bool> is a good fit for my needs (despite its conceptual problems). 在尝试了几个替代容器之后,似乎std::vector<bool>可以很好地满足我的需求(尽管存在概念上的问题)。

Every once in a while I need to reset all the bits in this vector. 偶尔我需要重置此向量中的所有位。
Since it is so big, I cannot afford a full reset of all its elements. 由于它是如此之大,因此我无法完全重置其所有元素。 However, I know the indices of all the set bits so I can reset them individually. 但是,我知道所有设置位的索引,因此可以单独重置它们。

Since std::vector<bool> represents bool s as bits, each such reset involves extra shifts and other such adjustment operations. 由于std::vector<bool>bool表示为位,因此每次此类重置都涉及额外的移位和其他此类调整操作。

Is there a (portable) way to do a "rough", low-accuracy, reset? 是否有(便携式)方式进行“粗略”,低精度的重置?
One that will reset the whole integer that my requested bit belongs to, avoiding any extra additional operations ? 它将重置我请求的位所属的整个整数,从而避免任何其他额外的操作?

Method 1 "Dense" : 方法1“密集”

If you need a "dense" container I would suggest you use a mixture of vector and bitset . 如果需要“密集”容器,建议您使用vectorbitset的混合物。

We store bits as a sequence of bitsets, thus we can use bitset::reset on each "chunk" to reset them all. 我们将位存储为位集序列,因此我们可以在每个“块”上使用bitset::reset reset来全部重置它们。 DynamicBitset::resize can be used to make room for the correct number of bits. DynamicBitset::resize可用于为正确的位数腾出空间。

class DynamicBitset
{
private:
    static const unsigned BITS_LEN = 64; // or 32, or more?
    typedef std::bitset<BITS_LEN> Bits;
    std::vector<Bits> data_;
public:
    DynamicBitset(size_t len=0) {
        resize(len);
    }
    // reset all bits to false
    void reset() {
        for(auto it=data_.begin(); it!=data_.end(); ++it) {
            it->reset(); // we can use the fast bitfield::reset :)
        }
    }
    // reset the whole bitset belonging to bit i
    void reset_approx(size_t i) {
        data_[i/BITS_LEN].reset();
    }
    // make room for len bits
    void resize(size_t len) {
        data_.resize(len/BITS_LEN + 1);
    }
    // access a bit
    Bits::reference operator[](size_t i) {
        size_t k = i/BITS_LEN;
        return data_[k][i-k*BITS_LEN];
    }
    bool operator[](size_t i) const {
        size_t k = i/BITS_LEN;
        return data_[k][i-k*BITS_LEN];
    }
};

Method 2 "Sparse" : 方法2“稀疏”

If you store only very few bits, you can also go with a mixture of map and bitset . 如果只存储很少的位,则还可以结合使用mapbitset

Here we store chunks only if there is at least on bit set in it. 在这里,我们仅在其中至少设置了位的情况下存储块。 This has additional costs for accessing bits as we need a lookup into a std::map which has O(log N) , but uses much less memory. 由于我们需要查找具有O(log N)但使用更少内存的std::map ,因此访问位会有额外的开销。

Additionally the function reset does exactly what you stated in your question - it only touches areas where you have set a bit. 此外,功能重置功能完全符合您在问题中所说的内容-它仅涉及您已设置的区域。

SparseDynamicBitset is a good choice when you have very long sequences of bits which are always false, eg for 1000...000100...010, and not for 0101000111010110. 如果您有很长的位序列(总是为false),例如对于1000 ... 000100 ... 010,而不是SparseDynamicBitsetSparseDynamicBitset是一个不错的选择。

class SparseDynamicBitset
{
private:
    static const unsigned BITS_LEN = 64; // ?
    typedef std::bitset<BITS_LEN> Bits;
    std::map<unsigned,Bits> data_;
public:
    // reset all "stored" bits to false
    void reset() {
        for(auto it=data_.begin(); it!=data_.end(); ++it) {
            it->second.reset(); // uses bitfield::reset :)
        }
    }
    // access a bit
    Bits::reference operator[](size_t i) {
        size_t k = i/BITS_LEN;
        return data_[k][i-k*BITS_LEN];
    }
    bool operator[](size_t i) const {
        size_t k = i/BITS_LEN;
        auto it = data_.find(k);
        if(it == it.end()) {
            return false; // the default
        }
        return it->second[i-k*BITS_LEN];
    }
};

Is there a (portable) way to do a "rough", low-accuracy, reset? 是否有(便携式)方式进行“粗略”,低精度的重置? One that will reset the whole integer that my requested bit belongs to, avoiding any extra additional operations? 可以重置我请求的位所属的整个整数,而避免任何其他附加操作的整数吗?

No, and no. 不,不。 (Not for that container type) (不适用于该容器类型)

FWIW, I wrote a data type that may help you with large sparse bit sets (you'll need to wrap it in an outer type that creates an array of them etc.). FWIW,我写了一个数据类型,可以帮助您处理大型稀疏位集(您需要将其包装在外部类型中,以创建它们的数组等)。 The type is 32 bits wide and tracks the on/off status of 1024 bits by using the 32 bits to either (worst case = > 3 bits set) store a pointer to a vector<unsigned> , or (typically) using the 32-bits as a 2-bit size/count and up to 3 embedded 10-bit indices of set bits. 该类型为32位宽,并通过使用32位来跟踪(最坏情况=> 3位设置)存储指向vector<unsigned>的指针,或(通常)使用32位指针来跟踪1024位的开/关状态。位为2位大小/计数,最多3个嵌入的10位设置位索引。 When the vector 's not needed, this achieves a 32x storage density improvement over a std::bitset<> , which should help reduce memory cache misses. 当不需要vector ,与std::bitset<>相比,这可以将存储密度提高32倍,这应该有助于减少内存缓存丢失。

NOTES: It's crucial that the size member - in_use - be aligned over the least significant bits in the vector* , such that any legitimate pointer (which will have at least 4-byte alignment) will necessarily have an in_use value of 0. Written for 32-bit apps where uintptr_t == 32 , but the same concept can be used to create a 64-bit version; 注意:至关重要的是,将大小成员in_use对齐到vector*的最低有效位上,这样任何合法指针(至少具有4个字节的对齐方式)的in_use值都必须为0。其中uintptr_t == 32 32位应用程序,但是可以使用相同的概念来创建64位版本。 the data type needs to use uintptr_t so it can store a pointer-to- vector when needed, so must match the 32- or 64-bit execution mode. 数据类型需要使用uintptr_t以便可以在需要时存储指向vector的指针,因此必须与32位或64位执行模式匹配。

Code includes tests showing random operations affect bitset s and Bit_Packer_32 s identically. 代码包含测试,这些测试表明随机操作对bitsetBit_Packer_32的影响相同。

#include <iostream>
#include <bitset>
#include <vector>
#include <algorithm>
#include <set>

class Bit_Packer_32
{
    // Invariants:
    // - 0 bits set: u_ == p_ == a == b == c == 0
    // - 1 bit set:  in_use == 1, a meaningful, b == c == 0
    // - 2 bits set: in_use == 2, a & b meaningful, c == 0
    // - 3 bits set: in_use == 3, a & b & c meaningful
    // - >3 bits:    in_use == 0, p_ != 0
    // NOTE: differentiating 0 from >3 depends on a == b == c == 0 for former

  public:
    Bit_Packer_32() : u_(0) { }

    class Reference
    {
      public:
        Reference(Bit_Packer_32& p, size_t n) : p_(p), n_(n) { }

        Reference& operator=(bool b) { p_.set(n_, b); return *this; }
        operator bool() const { return p_[n_]; }

      private:
        Bit_Packer_32& p_;
        size_t n_;
    };

    Reference operator[](size_t n) { return Reference(*this, n); }

    void set(size_t n)
    {
        switch (in_use)
        {
          case 0:
            if (p_)
            {
               if (std::find(p_->begin(), p_->end(), n) == p_->end())
                   p_->push_back(n);
            }
            else
            { in_use = 1; a = n; }
            break;
          case 1: if (a != n) { in_use = 2; b = n; } break;
          case 2: if (a != n && b != n) { in_use = 3; c = n; } break;
          case 3: if (a == n || b == n || c == n) break;
                  V* p = new V(4);
                  (*p)[0] = a; (*p)[1] = b; (*p)[2] = c; (*p)[3] = n;
                  p_ = p;
        }
    }

    void reset(size_t n)
    {
        switch (in_use)
        {
          case 0:
            if (p_)
            {
                V::iterator i = std::find(p_->begin(), p_->end(), n);
                if (i != p_->end())
                {
                    p_->erase(i);
                    if (p_->size() == 3)
                    {
                        // faster to copy out w/o erase, but tedious
                        int p0 = (*p_)[0], p1 = (*p_)[1], p2 = (*p_)[2];
                        delete p_;
                        a = p0; b = p1; c = p2;
                        in_use = 3;
                    }
                }
            }
            break;

          case 1: if (a == n) { u_ = 0; /* in_use = a = 0 */ } break;
          case 2: if (a == n) { in_use = 1; a = b; b = 0; break; }
                  else if (b == n) { in_use = 1; b = 0; break; }
          case 3:      if (a == n) a = c;
                  else if (b == n) b = c;
                  else if (c == n) ;
                  else break;
                  in_use = 2;
                  c = 0;
        }
    }

    void reset_all()
    {
        if (in_use == 0) delete p_;
        u_ = 0;
    }

    size_t count() const { return in_use ? in_use : p_ ? p_->size() : 0; }

    void set(size_t n, bool b) { if (b) set(n); else reset(n); }

    bool operator[](size_t n) const
    {
        switch (in_use)
        {
          case 0:
            return p_ && std::find(p_->begin(), p_->end(), n) != p_->end();
          case 1: return n == a;
          case 2: return n == a || n == b;
          case 3: return n == a || n == b || n == c;
        }
    }

    // e.g. operate_on<std::bitset<1024>, Op_Set>()
    //      operate_on<std::set<unsigned>, Op_Insert>()
    //      operate_on<std::vector<unsigned>, Op_Push_Back>()
    template <typename T, typename Op>
    T operate_on(const Op& op = Op()) const
    {
        T result;
        switch (in_use)
        {
          case 0:
            if (p_)
                for (V::const_iterator i = p_->begin(); i != p_->end(); ++i)
                     op(result, *i);
            break;
          case 3: op(result, c);
          case 2: op(result, b);
          case 1: op(result, a);
        }
        return result;
    }

  private:
    union
    {
        uintptr_t u_;
        typedef std::vector<unsigned> V;
        V* p_;
        struct
        {
            unsigned in_use : 2;
            unsigned a : 10;
            unsigned b : 10;
            unsigned c : 10;
        };
    };
};

struct Op_Insert
{
    template <typename T, typename U>
    void operator()(T& t, const U& u) const { t.insert(u); }
};

struct Op_Set
{
    template <typename T, typename U>
    void operator()(T& t, const U& u) const { t.set(u); }
};

struct Op_Push_Back
{
    template <typename T, typename U>
    void operator()(T& t, const U& u) const { t.push_back(u); }
};

#define TEST(A, B, MSG) \
    do { \
        bool pass = (A) == (B); \
        if (pass) break; \
        std::cout << "@" << __LINE__ << ": (" #A " == " #B ") "; \
        std::cout << (pass ? "pass\n" : "FAIL\n"); \
        std::cout << "  (" << (A) << " ==\n"; \
        std::cout << "   " << (B) << ")\n"; \
        std::cout << MSG << std::endl; \
    } while (false)

template <size_t N>
std::set<unsigned> to_set(const std::bitset<N>& bs)
{
    std::set<unsigned> result;
    for (unsigned i = 0; i < N; ++i)
        if (bs[i]) result.insert(i);
    return result;
}

template <typename T>
std::ostream& operator<<(std::ostream& os, const std::set<T>& s)
{
    for (std::set<T>::const_iterator i = s.begin(); i != s.end(); ++i)
    {
        if (i != s.begin()) os << ' ';
        os << *i;
    }
    return os;
}

int main()
{
    TEST(sizeof(uintptr_t), 4, "designed for 32-bit uintptr_t");

    for (int i = 0; i < 100000; ++i)
    {
        Bit_Packer_32 bp;
        std::bitset<1024> bs;
        for (int j = 0; j < 1 + i % 10; ++j)
        {
            int n = rand() % 1024;
            int v = rand() % 2;
            bs[n] = v;
            bp[n] = v;
            // TEST(bp.get_bitset(), bs);
            TEST((bp.operate_on<std::set<unsigned>, Op_Insert>()), to_set(bs),
                 "j " << j << ", n " << n << ", v " << v);
        }
    }
}

No, there is no portable way to do so, because there is no requirement to use bit fields, only a recommendation. 不,没有可移植的方法,因为不需要使用位字段,仅建议使用。 If you want to be portable, you might want to implement your own, based for example on std::vector<uint8_t> or a std::bitset . 如果要具有可移植性,则可能需要基于std::vector<uint8_t>std::bitset实现自己的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM