[英]Fast std::vector<bool> Reset
I have a very large vector of bits (10s of millions of "presence" bits) whose size known only at run-time (ie no std::bitset
), but known before actual usage, so the container can be pre-allocated. 我有它的大小只有在运行时已知的(即无位(以百万计的“存在”位10S)一个非常大的向量
std::bitset
),但实际使用之前已知的,因此容器可预先分配。
This vector is initially all zeros with single bits set incrementally, sparsely and randomly. 此向量最初是全零,并且以单个,增量,稀疏和随机的方式设置。 My only use of this container is direct random access - checking "presence" (no STL).
我对这个容器的唯一使用是直接随机访问-检查“状态”(没有STL)。 After trying several alternative containers it seems that
std::vector<bool>
is a good fit for my needs (despite its conceptual problems). 在尝试了几个替代容器之后,似乎
std::vector<bool>
可以很好地满足我的需求(尽管存在概念上的问题)。
Every once in a while I need to reset all the bits in this vector. 偶尔我需要重置此向量中的所有位。
Since it is so big, I cannot afford a full reset of all its elements. 由于它是如此之大,因此我无法完全重置其所有元素。 However, I know the indices of all the set bits so I can reset them individually.
但是,我知道所有设置位的索引,因此可以单独重置它们。
Since std::vector<bool>
represents bool
s as bits, each such reset involves extra shifts and other such adjustment operations. 由于
std::vector<bool>
将bool
表示为位,因此每次此类重置都涉及额外的移位和其他此类调整操作。
Is there a (portable) way to do a "rough", low-accuracy, reset? 是否有(便携式)方式进行“粗略”,低精度的重置?
One that will reset the whole integer that my requested bit belongs to, avoiding any extra additional operations ? 它将重置我请求的位所属的整个整数,从而避免任何其他额外的操作?
Method 1 "Dense" : 方法1“密集” :
If you need a "dense" container I would suggest you use a mixture of vector
and bitset
. 如果需要“密集”容器,建议您使用
vector
和bitset
的混合物。
We store bits as a sequence of bitsets, thus we can use bitset::reset
on each "chunk" to reset them all. 我们将位存储为位集序列,因此我们可以在每个“块”上使用
bitset::reset
reset来全部重置它们。 DynamicBitset::resize
can be used to make room for the correct number of bits. DynamicBitset::resize
可用于为正确的位数腾出空间。
class DynamicBitset
{
private:
static const unsigned BITS_LEN = 64; // or 32, or more?
typedef std::bitset<BITS_LEN> Bits;
std::vector<Bits> data_;
public:
DynamicBitset(size_t len=0) {
resize(len);
}
// reset all bits to false
void reset() {
for(auto it=data_.begin(); it!=data_.end(); ++it) {
it->reset(); // we can use the fast bitfield::reset :)
}
}
// reset the whole bitset belonging to bit i
void reset_approx(size_t i) {
data_[i/BITS_LEN].reset();
}
// make room for len bits
void resize(size_t len) {
data_.resize(len/BITS_LEN + 1);
}
// access a bit
Bits::reference operator[](size_t i) {
size_t k = i/BITS_LEN;
return data_[k][i-k*BITS_LEN];
}
bool operator[](size_t i) const {
size_t k = i/BITS_LEN;
return data_[k][i-k*BITS_LEN];
}
};
Method 2 "Sparse" : 方法2“稀疏” :
If you store only very few bits, you can also go with a mixture of map
and bitset
. 如果只存储很少的位,则还可以结合使用
map
和bitset
。
Here we store chunks only if there is at least on bit set in it. 在这里,我们仅在其中至少设置了位的情况下存储块。 This has additional costs for accessing bits as we need a lookup into a
std::map
which has O(log N)
, but uses much less memory. 由于我们需要查找具有
O(log N)
但使用更少内存的std::map
,因此访问位会有额外的开销。
Additionally the function reset does exactly what you stated in your question - it only touches areas where you have set a bit. 此外,功能重置功能完全符合您在问题中所说的内容-它仅涉及您已设置的区域。
SparseDynamicBitset
is a good choice when you have very long sequences of bits which are always false, eg for 1000...000100...010, and not for 0101000111010110. 如果您有很长的位序列(总是为false),例如对于1000 ... 000100 ... 010,而不是
SparseDynamicBitset
则SparseDynamicBitset
是一个不错的选择。
class SparseDynamicBitset
{
private:
static const unsigned BITS_LEN = 64; // ?
typedef std::bitset<BITS_LEN> Bits;
std::map<unsigned,Bits> data_;
public:
// reset all "stored" bits to false
void reset() {
for(auto it=data_.begin(); it!=data_.end(); ++it) {
it->second.reset(); // uses bitfield::reset :)
}
}
// access a bit
Bits::reference operator[](size_t i) {
size_t k = i/BITS_LEN;
return data_[k][i-k*BITS_LEN];
}
bool operator[](size_t i) const {
size_t k = i/BITS_LEN;
auto it = data_.find(k);
if(it == it.end()) {
return false; // the default
}
return it->second[i-k*BITS_LEN];
}
};
Is there a (portable) way to do a "rough", low-accuracy, reset?
是否有(便携式)方式进行“粗略”,低精度的重置? One that will reset the whole integer that my requested bit belongs to, avoiding any extra additional operations?
可以重置我请求的位所属的整个整数,而避免任何其他附加操作的整数吗?
No, and no. 不,不。 (Not for that container type)
(不适用于该容器类型)
FWIW, I wrote a data type that may help you with large sparse bit sets (you'll need to wrap it in an outer type that creates an array of them etc.). FWIW,我写了一个数据类型,可以帮助您处理大型稀疏位集(您需要将其包装在外部类型中,以创建它们的数组等)。 The type is 32 bits wide and tracks the on/off status of 1024 bits by using the 32 bits to either (worst case = > 3 bits set) store a pointer to a
vector<unsigned>
, or (typically) using the 32-bits as a 2-bit size/count and up to 3 embedded 10-bit indices of set bits. 该类型为32位宽,并通过使用32位来跟踪(最坏情况=> 3位设置)存储指向
vector<unsigned>
的指针,或(通常)使用32位指针来跟踪1024位的开/关状态。位为2位大小/计数,最多3个嵌入的10位设置位索引。 When the vector
's not needed, this achieves a 32x storage density improvement over a std::bitset<>
, which should help reduce memory cache misses. 当不需要
vector
,与std::bitset<>
相比,这可以将存储密度提高32倍,这应该有助于减少内存缓存丢失。
NOTES: It's crucial that the size member - in_use
- be aligned over the least significant bits in the vector*
, such that any legitimate pointer (which will have at least 4-byte alignment) will necessarily have an in_use
value of 0. Written for 32-bit apps where uintptr_t == 32
, but the same concept can be used to create a 64-bit version; 注意:至关重要的是,将大小成员
in_use
对齐到vector*
的最低有效位上,这样任何合法指针(至少具有4个字节的对齐方式)的in_use
值都必须为0。其中uintptr_t == 32
32位应用程序,但是可以使用相同的概念来创建64位版本。 the data type needs to use uintptr_t
so it can store a pointer-to- vector
when needed, so must match the 32- or 64-bit execution mode. 数据类型需要使用
uintptr_t
以便可以在需要时存储指向vector
的指针,因此必须与32位或64位执行模式匹配。
Code includes tests showing random operations affect bitset
s and Bit_Packer_32
s identically. 代码包含测试,这些测试表明随机操作对
bitset
和Bit_Packer_32
的影响相同。
#include <iostream>
#include <bitset>
#include <vector>
#include <algorithm>
#include <set>
class Bit_Packer_32
{
// Invariants:
// - 0 bits set: u_ == p_ == a == b == c == 0
// - 1 bit set: in_use == 1, a meaningful, b == c == 0
// - 2 bits set: in_use == 2, a & b meaningful, c == 0
// - 3 bits set: in_use == 3, a & b & c meaningful
// - >3 bits: in_use == 0, p_ != 0
// NOTE: differentiating 0 from >3 depends on a == b == c == 0 for former
public:
Bit_Packer_32() : u_(0) { }
class Reference
{
public:
Reference(Bit_Packer_32& p, size_t n) : p_(p), n_(n) { }
Reference& operator=(bool b) { p_.set(n_, b); return *this; }
operator bool() const { return p_[n_]; }
private:
Bit_Packer_32& p_;
size_t n_;
};
Reference operator[](size_t n) { return Reference(*this, n); }
void set(size_t n)
{
switch (in_use)
{
case 0:
if (p_)
{
if (std::find(p_->begin(), p_->end(), n) == p_->end())
p_->push_back(n);
}
else
{ in_use = 1; a = n; }
break;
case 1: if (a != n) { in_use = 2; b = n; } break;
case 2: if (a != n && b != n) { in_use = 3; c = n; } break;
case 3: if (a == n || b == n || c == n) break;
V* p = new V(4);
(*p)[0] = a; (*p)[1] = b; (*p)[2] = c; (*p)[3] = n;
p_ = p;
}
}
void reset(size_t n)
{
switch (in_use)
{
case 0:
if (p_)
{
V::iterator i = std::find(p_->begin(), p_->end(), n);
if (i != p_->end())
{
p_->erase(i);
if (p_->size() == 3)
{
// faster to copy out w/o erase, but tedious
int p0 = (*p_)[0], p1 = (*p_)[1], p2 = (*p_)[2];
delete p_;
a = p0; b = p1; c = p2;
in_use = 3;
}
}
}
break;
case 1: if (a == n) { u_ = 0; /* in_use = a = 0 */ } break;
case 2: if (a == n) { in_use = 1; a = b; b = 0; break; }
else if (b == n) { in_use = 1; b = 0; break; }
case 3: if (a == n) a = c;
else if (b == n) b = c;
else if (c == n) ;
else break;
in_use = 2;
c = 0;
}
}
void reset_all()
{
if (in_use == 0) delete p_;
u_ = 0;
}
size_t count() const { return in_use ? in_use : p_ ? p_->size() : 0; }
void set(size_t n, bool b) { if (b) set(n); else reset(n); }
bool operator[](size_t n) const
{
switch (in_use)
{
case 0:
return p_ && std::find(p_->begin(), p_->end(), n) != p_->end();
case 1: return n == a;
case 2: return n == a || n == b;
case 3: return n == a || n == b || n == c;
}
}
// e.g. operate_on<std::bitset<1024>, Op_Set>()
// operate_on<std::set<unsigned>, Op_Insert>()
// operate_on<std::vector<unsigned>, Op_Push_Back>()
template <typename T, typename Op>
T operate_on(const Op& op = Op()) const
{
T result;
switch (in_use)
{
case 0:
if (p_)
for (V::const_iterator i = p_->begin(); i != p_->end(); ++i)
op(result, *i);
break;
case 3: op(result, c);
case 2: op(result, b);
case 1: op(result, a);
}
return result;
}
private:
union
{
uintptr_t u_;
typedef std::vector<unsigned> V;
V* p_;
struct
{
unsigned in_use : 2;
unsigned a : 10;
unsigned b : 10;
unsigned c : 10;
};
};
};
struct Op_Insert
{
template <typename T, typename U>
void operator()(T& t, const U& u) const { t.insert(u); }
};
struct Op_Set
{
template <typename T, typename U>
void operator()(T& t, const U& u) const { t.set(u); }
};
struct Op_Push_Back
{
template <typename T, typename U>
void operator()(T& t, const U& u) const { t.push_back(u); }
};
#define TEST(A, B, MSG) \
do { \
bool pass = (A) == (B); \
if (pass) break; \
std::cout << "@" << __LINE__ << ": (" #A " == " #B ") "; \
std::cout << (pass ? "pass\n" : "FAIL\n"); \
std::cout << " (" << (A) << " ==\n"; \
std::cout << " " << (B) << ")\n"; \
std::cout << MSG << std::endl; \
} while (false)
template <size_t N>
std::set<unsigned> to_set(const std::bitset<N>& bs)
{
std::set<unsigned> result;
for (unsigned i = 0; i < N; ++i)
if (bs[i]) result.insert(i);
return result;
}
template <typename T>
std::ostream& operator<<(std::ostream& os, const std::set<T>& s)
{
for (std::set<T>::const_iterator i = s.begin(); i != s.end(); ++i)
{
if (i != s.begin()) os << ' ';
os << *i;
}
return os;
}
int main()
{
TEST(sizeof(uintptr_t), 4, "designed for 32-bit uintptr_t");
for (int i = 0; i < 100000; ++i)
{
Bit_Packer_32 bp;
std::bitset<1024> bs;
for (int j = 0; j < 1 + i % 10; ++j)
{
int n = rand() % 1024;
int v = rand() % 2;
bs[n] = v;
bp[n] = v;
// TEST(bp.get_bitset(), bs);
TEST((bp.operate_on<std::set<unsigned>, Op_Insert>()), to_set(bs),
"j " << j << ", n " << n << ", v " << v);
}
}
}
No, there is no portable way to do so, because there is no requirement to use bit fields, only a recommendation. 不,没有可移植的方法,因为不需要使用位字段,仅建议使用。 If you want to be portable, you might want to implement your own, based for example on
std::vector<uint8_t>
or a std::bitset
. 如果要具有可移植性,则可能需要基于
std::vector<uint8_t>
或std::bitset
实现自己的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.