简体   繁体   中英

Why is std::bitset<8> 4 bytes big?

It seems for std::bitset<1 to 32>, the size is set to 4 bytes. For sizes 33 to 64, it jumps straight up to 8 bytes. There can't be any overhead because std::bitset<32> is an even 4 bytes.

I can see aligning to byte length when dealing with bits, but why would a bitset need to align to word length, especially for a container most likely to be used in situations with a tight memory budget?

This is under VS2010.

The most likely explanation is that bitset is using a whole number of machine words to store the array.

This is probably done for memory bandwidth reasons: it is typically relatively cheap to read/write a word that's aligned at a word boundary. On the other hand, reading (and especially writing!) an arbitrarily-aligned byte can be expensive on some architectures.

Since we're talking about a fixed-sized penalty of a few bytes per bitset , this sounds like a reasonable tradeoff for a general-purpose library.

I assume that indexing into the bitset is done by grabbing a 32-bit value and then isolating the relevant bit because this is fastest in terms of processor instructions (working with smaller-sized values is slower on x86). The two indexes needed for this can also be calculated very quickly:

int wordIndex = (index & 0xfffffff8) >> 3;
int bitIndex = index & 0x7;

And then you can do this, which is also very fast:

int word = m_pStorage[wordIndex];
bool bit = ((word & (1 << bitIndex)) >> bitIndex) == 1;

Also, a maximum waste of 3 bytes per bitset is not exactly a memory concern IMHO. Consider that a bitset is already the most efficient data structure to store this type of information, so you would have to evaluate the waste as a percentage of the total structure size.

For 1025 bits this approach uses up 132 bytes instead of 129, for 2.3% overhead (and this goes down as the bitset site goes up). Sounds reasonable considering the likely performance benefits.

The memory system on modern machines cannot fetch anything else but words from memory, apart from some legacy functions that extract the desired bits. Hence, having the bitsets aligned to words makes them a lot faster to handle, because you do not need to mask out the bits you don't need when accessing it. If you do not mask, doing something like

bitset<4> foo = 0;
if (foo) {
    // ...
}

will most likely fail. Apart from that, I remember reading some time ago that there was a way to cramp several bitsets together, but I don't remember exactly. I think it was when you have several bitsets together in a structure that they can take up "shared" memory, which is not applicable to most use cases of bitfields.

I had the same feature in Aix and Linux implementations. In Aix, internal bitset storage is char based:

typedef unsigned char _Ty;
....
_Ty _A[_Nw + 1];

In Linux, internal storage is long based:

typedef unsigned long _WordT;
....
_WordT            _M_w[_Nw];

For compatibility reasons, we modified Linux version with char based storage

Check which implementation are you using inside bitset.h

Because a 32 bit Intel-compatible processor cannot access bytes individually (or better, it can by applying implicitly some bit mask and shifts) but only 32bit words at time.

if you declare

bitset<4> a,b,c;

even if the library implements it as char, a , b and c will be 32 bit aligned, so the same wasted space exist. But the processor will be forced to premask the bytes before letting bitset code to do its own mask.

For this reason MS used a int[1+(N-1)/32] as a container for the bits.

Maybe because it's using int by default, and switches to long long if it overflows? (Just a guess...)

If your std::bitset< 8 > was a member of a structure, you might have this:

struct A
{
  std::bitset< 8 > mask;
  void * pointerToSomething;
}

If bitset<8> was stored in one byte (and the structure packed on 1-byte boundaries) then the pointer following it in the structure would be unaligned, which would be A Bad Thing. The only time when it would be safe and useful to have a bitset<8> stored in one byte would be if it was in a packed structure and followed by some other one-byte fields with which it could be packed together. I guess this is too narrow a use case for it to be worthwhile providing a library implementation.

Basically, in your octree, a single byte bitset would only be useful if it was followed in a packed structure by another one to three single-byte members. Otherwise, it would have to be padded to four bytes anyway (on a 32-bit machine) to ensure that the following variable was word-aligned.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM