简体   繁体   中英

Why is a Java BitSet packed in 6 bytes?

I was looking intoBitSet and the following are not clear to me:

  1. When we pass the number of bits there is a division by 6 . Why is 6 being used and not some power of 2?
  2. When initializing the underlying array why is there first a subtraction by 1 before the division by 6 followed by an addition of 1?

I'm assuming you are asking about this bit of code in the JDK:

private static int wordIndex(int bitIndex) {
    return bitIndex >> ADDRESS_BITS_PER_WORD; // ADDRESS_BITS_PER_WORD is 6, question 1
}

public BitSet(int nbits) {
    // nbits can't be negative; size 0 is OK
    if (nbits < 0)
        throw new NegativeArraySizeException("nbits < 0: " + nbits);

    initWords(nbits);
    sizeIsSticky = true;
}

private void initWords(int nbits) {
    words = new long[wordIndex(nbits-1) + 1]; // question 2
}

initWords initialises a long[] to back the bits, essentially storing the bits into "words" of 64 bits. Note that this seems to be an implementation detail. How long should this long[] be? Well, it should be the word index of the last word + 1, because indices are zero-based.

What is the index of the last word ? Well, the wordIndex method can tell us the word index of a bit , so if we give it the index of the last bit, nbits - 1 (again because indices are zero-based), it will give us what we want. This should answer your second question.

How does wordIndex find the word index? Well, there are 64 bits in a long , so we just need to divide the bitIndex by 64. What's another way of dividing by 64? Shift left 6 times, since 64 = 2 to the power 6. See this post for more info.

I suppose you talk about the implementation of this class?

It says so in a comment in the source file:

    /*
     * BitSets are packed into arrays of "words."  Currently a word is
     * a long, which consists of 64 bits, requiring 6 address bits.
     * The choice of word size is determined purely by performance concerns.
     */

So from a given bit number, the lower 6 bits are used for addressing a bit withing a 64 bit word, and the remaining bits are for addressing the word.


For the point 2, I suppose you talk about

wordIndex(nbits-1) + 1

which is

bitIndex >> ADDRESS_BITS_PER_WORD
  • Suppose you want to initialize a BitSet with initially 0 entries. Then you need an array size of 0.
  • Suppose you want to initialize a BitSet with initially 1 to 64 entries. Then you need an array size of 1.
  • Suppose you want to initialize a BitSet with initially 65 to 128 entries. Then you need an array size of 2.

And so on.

This means, you map the original range (1-64, 65-128) to "one less" (0-63, 64-127), divide by 64 (0, 1) and increase the result again (1, 2) to get the number of needed words in the array.


To demonstrate both:

Suppose you want a BitSet with 128 entries. You initialize it and you get an array with 2 64 bit entries. Why?

That's because wach word can hold 64 bits, so in order to hold 128 bits, you need 2 array entries:

(128-1)/64 + 1 = 127/64 + 1 = 1 + 1 = 2. (Remember that integer divisions go towards the lower value.)

Now, you want to set the bis 5, 13 and 66.

Bit 5 and 13 are fine - you just set bits 5 and 13 in the word at index 0.

But what do you do with the 66? Each word has only 64 bits. (0..63)

Well, in this situation, you substract 64 for each step you make in the array. So you go to word at index 1 and for "compensation", you go from 66 to 2.

That's exactly what happens with these bit manipulations: from each of these bit indexes, the lower 6 bits are taken and used as bit address in the respective word.

5 = 0 000101 = 0/5 13 = 0 001101 = 0/13 66 = 1 000001 = 1/2

So

  • 5 is set by setting bit 5 in the word 0.
  • 13 is set by setting bit 13 in the word 0.
  • 66 is set by setting bit 2 in the word 1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM