简体   繁体   English

C ++中的位数组

[英]Bit Array in C++

When working with Project Euler problems I often need large (> 10**7) bit array's. 在使用Project Euler问题时,我经常需要大型(> 10 ** 7)位数组。

My normal approach is one of: 我的正常方法是:

bool* sieve = new bool[N];

bool sieve[N];

When N = 1,000,000 my program uses 1 MegaByte (8 * 1,000,000 bits). 当N = 1,000,000时,我的程序使用1兆字节(8 * 1,000,000位)。

Is there a more efficient way to use store bit arrays than bool in c++? 在c ++中使用存储位数组是否比bool更有效?

Use std::bitset (if N is a constant) otherwise use std::vector<bool> as others have mentioned (but dont forget reading this excellent article by Herb Sutter) 使用std::bitset (如果N是常量),否则使用std::vector<bool>就像其他人提到的那样(但不要忘记阅读Herb Sutter的这篇优秀文章

A bitset is a special container class that is designed to store bits (elements with only two possible values: 0 or 1, true or false, ...). bitset是一个特殊的容器类,用于存储位(只有两个可能值的元素:0或1,true或false,......)。

The class is very similar to a regular array, but optimizing for space allocation : each element occupies only one bit (which is eight times less than the smallest elemental type in C++: char). 该类与常规数组非常相似, 但优化空间分配 :每个元素只占一位(比C ++中最小的元素类型小八倍:char)。

EDIT : 编辑

Herb Sutter (in that article) mentions that Herb Sutter(在那篇文章中)提到了这一点

The reason std::vector< bool > is nonconforming is that it pulls tricks under the covers in an attempt to optimize for space: Instead of storing a full char or int for every bool[1] (taking up at least 8 times the space, on platforms with 8-bit chars), it packs the bools and stores them as individual bits (inside, say, chars) in its internal representation. std :: vector <bool>不合格的原因是它为了优化空间而在底层提取技巧:而不是为每个bool [1]存储一个完整的char或int(占用至少8倍的空间) ,在具有8位字符的平台上), 它打包bool并将它们作为单独的位 (内部,比如,字符)存储在其内部表示中。

std::vector < bool > forces a specific optimization on all users by enshrining it in the standard. std :: vector <bool>通过将其包含在标准中来强制对所有用户进行特定优化。 That's not a good idea; 这不是一个好主意; different users have different requirements, and now all users of vector must pay the performance penalty even if they don't want or need the space savings. 不同的用户有不同的要求,现在所有向量用户都必须支付性能损失,即使他们不想要或不需要节省空间。

EDIT 2 : 编辑2

And if you have used Boost you can use boost::dynamic_bitset (if N is known at runtime) 如果你使用过Boost,你可以使用boost::dynamic_bitset (如果N在运行时已知)

For better or for worse, std::vector<bool> will use bits instead of bool's, to save space. 无论好坏, std::vector<bool>将使用位而不是bool,以节省空间。 So just use std::vector like you should have been in the first place. 所以只需使用std::vector就像你应该在第一时间一样。

If N is a constant , you can use std::bitset . 如果N是常量 ,则可以使用std::bitset

You could look up std::bitset and std::vector<bool> . 你可以查找std::bitsetstd::vector<bool> The latter is often recommended against, because despite the vector in the name, it doesn't really act like a vector of any other kind of object, and in fact doesn't meet the requirements for a container in general. 后者通常被推荐反对,因为尽管名称中的vector ,它实际上并不像任何其他类型的对象的矢量,并且实际上不满足一般容器的要求。 Nonetheless, it can be pretty useful. 尽管如此,它可能非常有用。

OTOH, nothing is going to (at least dependably) store 1 million bool values in less than 1 million bits. OTOH,没有任何东西(至少可靠地)以不到100万比特存储100万个bool值。 It simply can't be done with any certainty. 它根本无法确定。 If your bit sets contain a degree of redundancy, there are various compression schemes that might be effective (eg, LZ*, Huffman, arithmetic) but without some knowledge of the contents, it's impossible to say they would be for certain. 如果你的位集包含一定程度的冗余,那么有各种压缩方案可能是有效的(例如,LZ *,霍夫曼,算术),但是如果不了解内容,就不可能说它们是肯定的。 Either of these will, however, normally store each bool/bit in only one bit of storage (plus a little overhead for bookkeeping -- but that's usually a constant, and on the order of bytes to tens of bytes at most). 但是,这些中的任何一个通常都会将每个bool / bit存储在一个存储位中(加上一点点用于簿记的开销 - 但这通常是一个常量,并且最多为字节到几十个字节)。

A 'bool' type isn't stored using only 1 bit. 仅使用1位不存储'bool'类型。 From your comment about the size, it seems to use 1 entire byte for each bool. 根据你对大小的评论,似乎每个bool使用1个整个字节。

AC like way of doing this would be: AC喜欢这样做的方式是:

uint8_t sieve[N/8]; //array of N/8 bytes

and then logical OR bytes together to get all your bits: 然后逻辑OR字节一起得到你所有的位:

sieve[0] = 0x01 | 0x02; //this would turn on the first two bits

In that example, 0x01 and 0x02 are hexadecimal numbers that represent bytes. 在该示例中,0x01和0x02是表示字节的十六进制数字。

是的,你可以使用bitset

You might be interested in trying the BITSCAN library as an alternative. 您可能有兴趣尝试BITSCAN库作为替代方案。 Recently an extension has been proposed for sparseness, which I am not sure is your case, but might be. 最近有一个扩展已被提议用于稀疏性,我不确定是你的情况,但可能是。

You can use a byte array and index into that. 您可以使用字节数组和索引。 Index n would be in byte index n/8 , bit # n%8 . 索引n将在字节索引n/8 ,位# n%8 (In case std::bitset is not available for some reason). (如果由于某种原因std :: bitset不可用)。

如果在编译时已知N,则使用std :: bitset ,否则使用boost :: dynamic_bitset

A 'bool' type isn't stored using only 1 bit. 仅使用1位不存储'bool'类型。 From your comment about the size, it seems to use 1 entire byte for each bool. 根据你对大小的评论,似乎每个bool使用1个整个字节。

AC like way of doing this would be: AC喜欢这样做的方式是:

uint8_t sieve[N/8]; //array of N/8 bytes

element of array is: 数组的元素是:

result = sieve[index / 8] || (1 << (index % 8)); 

or 要么

result = sieve[index >> 3] || (1 << (index & 7));

set 1 in array: 在数组中设置1:

sieve[index >> 3] |= 1 << (index & 7);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM