[英]Bit Array in C++
When working with Project Euler problems I often need large (> 10**7) bit array's. 在使用Project Euler问题时,我经常需要大型(> 10 ** 7)位数组。
My normal approach is one of: 我的正常方法是:
bool* sieve = new bool[N];
bool sieve[N];
When N = 1,000,000 my program uses 1 MegaByte (8 * 1,000,000 bits). 当N = 1,000,000时,我的程序使用1兆字节(8 * 1,000,000位)。
Is there a more efficient way to use store bit arrays than bool in c++? 在c ++中使用存储位数组是否比bool更有效?
Use std::bitset
(if N
is a constant) otherwise use std::vector<bool>
as others have mentioned (but dont forget reading this excellent article by Herb Sutter) 使用
std::bitset
(如果N
是常量),否则使用std::vector<bool>
就像其他人提到的那样(但不要忘记阅读Herb Sutter的这篇优秀文章 )
A bitset is a special container class that is designed to store bits (elements with only two possible values: 0 or 1, true or false, ...).
bitset是一个特殊的容器类,用于存储位(只有两个可能值的元素:0或1,true或false,......)。
The class is very similar to a regular array, but optimizing for space allocation : each element occupies only one bit (which is eight times less than the smallest elemental type in C++: char).
该类与常规数组非常相似, 但优化空间分配 :每个元素只占一位(比C ++中最小的元素类型小八倍:char)。
EDIT : 编辑 :
Herb Sutter (in that article) mentions that Herb Sutter(在那篇文章中)提到了这一点
The reason std::vector< bool > is nonconforming is that it pulls tricks under the covers in an attempt to optimize for space: Instead of storing a full char or int for every bool[1] (taking up at least 8 times the space, on platforms with 8-bit chars), it packs the bools and stores them as individual bits (inside, say, chars) in its internal representation.
std :: vector <bool>不合格的原因是它为了优化空间而在底层提取技巧:而不是为每个bool [1]存储一个完整的char或int(占用至少8倍的空间) ,在具有8位字符的平台上), 它打包bool并将它们作为单独的位 (内部,比如,字符)存储在其内部表示中。
std::vector < bool > forces a specific optimization on all users by enshrining it in the standard.
std :: vector <bool>通过将其包含在标准中来强制对所有用户进行特定优化。 That's not a good idea;
这不是一个好主意; different users have different requirements, and now all users of vector must pay the performance penalty even if they don't want or need the space savings.
不同的用户有不同的要求,现在所有向量用户都必须支付性能损失,即使他们不想要或不需要节省空间。
EDIT 2 : 编辑2 :
And if you have used Boost you can use boost::dynamic_bitset
(if N
is known at runtime) 如果你使用过Boost,你可以使用
boost::dynamic_bitset
(如果N
在运行时已知)
For better or for worse, std::vector<bool>
will use bits instead of bool's, to save space. 无论好坏,
std::vector<bool>
将使用位而不是bool,以节省空间。 So just use std::vector
like you should have been in the first place. 所以只需使用
std::vector
就像你应该在第一时间一样。
If N
is a constant , you can use std::bitset
. 如果
N
是常量 ,则可以使用std::bitset
。
You could look up std::bitset
and std::vector<bool>
. 你可以查找
std::bitset
和std::vector<bool>
。 The latter is often recommended against, because despite the vector
in the name, it doesn't really act like a vector of any other kind of object, and in fact doesn't meet the requirements for a container in general. 后者通常被推荐反对,因为尽管名称中的
vector
,它实际上并不像任何其他类型的对象的矢量,并且实际上不满足一般容器的要求。 Nonetheless, it can be pretty useful. 尽管如此,它可能非常有用。
OTOH, nothing is going to (at least dependably) store 1 million bool values in less than 1 million bits. OTOH,没有任何东西(至少可靠地)以不到100万比特存储100万个bool值。 It simply can't be done with any certainty.
它根本无法确定。 If your bit sets contain a degree of redundancy, there are various compression schemes that might be effective (eg, LZ*, Huffman, arithmetic) but without some knowledge of the contents, it's impossible to say they would be for certain.
如果你的位集包含一定程度的冗余,那么有各种压缩方案可能是有效的(例如,LZ *,霍夫曼,算术),但是如果不了解内容,就不可能说它们是肯定的。 Either of these will, however, normally store each bool/bit in only one bit of storage (plus a little overhead for bookkeeping -- but that's usually a constant, and on the order of bytes to tens of bytes at most).
但是,这些中的任何一个通常都会将每个bool / bit存储在一个存储位中(加上一点点用于簿记的开销 - 但这通常是一个常量,并且最多为字节到几十个字节)。
A 'bool' type isn't stored using only 1 bit. 仅使用1位不存储'bool'类型。 From your comment about the size, it seems to use 1 entire byte for each bool.
根据你对大小的评论,似乎每个bool使用1个整个字节。
AC like way of doing this would be: AC喜欢这样做的方式是:
uint8_t sieve[N/8]; //array of N/8 bytes
and then logical OR bytes together to get all your bits: 然后逻辑OR字节一起得到你所有的位:
sieve[0] = 0x01 | 0x02; //this would turn on the first two bits
In that example, 0x01 and 0x02 are hexadecimal numbers that represent bytes. 在该示例中,0x01和0x02是表示字节的十六进制数字。
是的,你可以使用bitset 。
You can use a byte array and index into that. 您可以使用字节数组和索引。 Index
n
would be in byte index n/8
, bit # n%8
. 索引
n
将在字节索引n/8
,位# n%8
。 (In case std::bitset is not available for some reason). (如果由于某种原因std :: bitset不可用)。
如果在编译时已知N,则使用std :: bitset ,否则使用boost :: dynamic_bitset 。
A 'bool' type isn't stored using only 1 bit. 仅使用1位不存储'bool'类型。 From your comment about the size, it seems to use 1 entire byte for each bool.
根据你对大小的评论,似乎每个bool使用1个整个字节。
AC like way of doing this would be: AC喜欢这样做的方式是:
uint8_t sieve[N/8]; //array of N/8 bytes
element of array is: 数组的元素是:
result = sieve[index / 8] || (1 << (index % 8));
or 要么
result = sieve[index >> 3] || (1 << (index & 7));
set 1 in array: 在数组中设置1:
sieve[index >> 3] |= 1 << (index & 7);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.