为什么 Java BitSet 打包成 6 个字节？

Question

I was looking intoBitSet and the following are not clear to me:我正在研究BitSet ，但我不清楚以下几点：

When we pass the number of bits there is a division by 6 .当我们传递位数时，会除以6 。 Why is 6 being used and not some power of 2?为什么使用6而不是 2 的幂？
When initializing the underlying array why is there first a subtraction by 1 before the division by 6 followed by an addition of 1?初始化底层数组时，为什么在除以6之前先减 1，然后加 1？

Answer 1

I'm assuming you are asking about this bit of code in the JDK:我假设您正在询问 JDK 中的这段代码：

private static int wordIndex(int bitIndex) {
    return bitIndex >> ADDRESS_BITS_PER_WORD; // ADDRESS_BITS_PER_WORD is 6, question 1
}

public BitSet(int nbits) {
    // nbits can't be negative; size 0 is OK
    if (nbits < 0)
        throw new NegativeArraySizeException("nbits < 0: " + nbits);

    initWords(nbits);
    sizeIsSticky = true;
}

private void initWords(int nbits) {
    words = new long[wordIndex(nbits-1) + 1]; // question 2
}

initWords initialises a long[] to back the bits, essentially storing the bits into "words" of 64 bits. initWords初始化一个long[]来支持这些位，基本上将这些位存储到 64 位的“字”中。 Note that this seems to be an implementation detail.请注意，这似乎是一个实现细节。 How long should this long[] be?这个long[]应该有多长？ Well, it should be the word index of the last word + 1, because indices are zero-based.好吧，它应该是最后一个单词的单词索引+ 1，因为索引是从零开始的。

What is the index of the last word ?最后一个词的索引是多少？ Well, the wordIndex method can tell us the word index of a bit , so if we give it the index of the last bit, nbits - 1 (again because indices are zero-based), it will give us what we want.嗯， wordIndex方法可以告诉我们一个位的单词索引，所以如果我们给它最后一位的索引， nbits - 1 （同样因为索引是从零开始的），它会给我们想要的。 This should answer your second question.这应该回答你的第二个问题。

How does wordIndex find the word index? wordIndex如何找到单词索引？ Well, there are 64 bits in a long , so we just need to divide the bitIndex by 64. What's another way of dividing by 64?嗯，一个long有 64 位，所以我们只需要将bitIndex除以 64。除以 64 的另一种方法是什么？ Shift left 6 times, since 64 = 2 to the power 6. See this post for more info.左移 6 次，因为 64 = 2 的 6 次方。有关更多信息，请参阅此帖子。

Answer 2

I suppose you talk about the implementation of this class?我想你谈谈这个 class 的实现？

It says so in a comment in the source file:它在源文件的注释中这样说：

    /*
     * BitSets are packed into arrays of "words."  Currently a word is
     * a long, which consists of 64 bits, requiring 6 address bits.
     * The choice of word size is determined purely by performance concerns.
     */

So from a given bit number, the lower 6 bits are used for addressing a bit withing a 64 bit word, and the remaining bits are for addressing the word.因此，对于给定的位数，低 6 位用于寻址 64 位字的位，其余位用于寻址字。

For the point 2, I suppose you talk about对于第 2 点，我想你在谈论

wordIndex(nbits-1) + 1

which is这是

bitIndex >> ADDRESS_BITS_PER_WORD

Suppose you want to initialize a BitSet with initially 0 entries.假设你想初始化一个最初有 0 个条目的 BitSet。 Then you need an array size of 0.然后你需要一个大小为 0 的数组。
Suppose you want to initialize a BitSet with initially 1 to 64 entries.假设您要初始化一个最初包含 1 到 64 个条目的 BitSet。 Then you need an array size of 1.然后你需要一个大小为 1 的数组。
Suppose you want to initialize a BitSet with initially 65 to 128 entries.假设您要初始化一个最初包含 65 到 128 个条目的 BitSet。 Then you need an array size of 2.然后你需要一个大小为 2 的数组。

And so on.等等。

This means, you map the original range (1-64, 65-128) to "one less" (0-63, 64-127), divide by 64 (0, 1) and increase the result again (1, 2) to get the number of needed words in the array.这意味着，您将 map 原始范围 (1-64, 65-128) 改为“减一” (0-63, 64-127)，除以 64 (0, 1) 并再次增加结果 (1, 2)获取数组中所需单词的数量。

To demonstrate both:为了证明两者：

Suppose you want a BitSet with 128 entries.假设您想要一个包含 128 个条目的 BitSet。 You initialize it and you get an array with 2 64 bit entries.你初始化它，你会得到一个包含 2 个 64 位条目的数组。 Why?为什么？

That's because wach word can hold 64 bits, so in order to hold 128 bits, you need 2 array entries:这是因为 wach 字可以容纳 64 位，所以为了容纳 128 位，您需要 2 个数组条目：

(128-1)/64 + 1 = 127/64 + 1 = 1 + 1 = 2. (Remember that integer divisions go towards the lower value.) (128-1)/64 + 1 = 127/64 + 1 = 1 + 1 = 2。（请记住，integer 将 go 划分为较低的值。）

Now, you want to set the bis 5, 13 and 66.现在，您要设置 bis 5、13 和 66。

Bit 5 and 13 are fine - you just set bits 5 and 13 in the word at index 0.第 5 位和第 13 位很好 - 您只需在索引 0 处设置字中的第 5 位和第 13 位。

But what do you do with the 66?但是你用 66 做什么呢？ Each word has only 64 bits.每个字只有 64 位。 (0..63) (0..63)

Well, in this situation, you substract 64 for each step you make in the array.好吧，在这种情况下，您在数组中执行的每个步骤都减去 64。 So you go to word at index 1 and for "compensation", you go from 66 to 2.所以你 go 到索引 1 处的单词和“补偿”，你 go 从 66 到 2。

That's exactly what happens with these bit manipulations: from each of these bit indexes, the lower 6 bits are taken and used as bit address in the respective word.这正是这些位操作所发生的事情：从这些位索引中的每一个，低 6 位被取出并用作相应字中的位地址。

5 = 0 000101 = 0/5 13 = 0 001101 = 0/13 66 = 1 000001 = 1/2 5 = 0 000101 = 0/5 13 = 0 001101 = 0/13 66 = 1 000001 = 1/2

So所以

5 is set by setting bit 5 in the word 0. 5 通过设置字 0 中的位 5 来设置。
13 is set by setting bit 13 in the word 0. 13 通过设置字 0 中的位 13 来设置。
66 is set by setting bit 2 in the word 1.通过设置字 1 中的位 2 来设置 66。

为什么 Java BitSet 打包成 6 个字节？

问题描述

2 个解决方案

解决方案1
3 2020-05-15 08:40:57

解决方案2
1 2020-05-15 08:35:30

为什么 Java BitSet 打包成 6 个字节？

问题描述

2 个解决方案

解决方案1 3 2020-05-15 08:40:57

解决方案2 1 2020-05-15 08:35:30

解决方案1
3 2020-05-15 08:40:57

解决方案2
1 2020-05-15 08:35:30