简体   繁体   中英

Why does Java's BitSet internally store long array but use int for set method?

According to the BitSet implementation, it internally uses an array of longs:

/**
 * The internal field corresponding to the serialField "bits".
 */
private long[] words;

But for the set method it uses int:

public void set(int bitIndex) {...}

So basically we can store (2^31 - 1) * 64 * 8 = 2,147,483,642 * 64 * 8 = 137,438,953,088 bits , but using int indexing we have access only to the first 2,147,483,648 bits.

Which means that 137,438,953,088 - 2,147,483,648 = 135,291,469,440 bits are unavailable.

But if developers of this class used long instead of int for bits indexing, it would solve all the problems, since with long we can navigate trough 2^63 - 1 = 9,223,372,036,854,775,807 bits

It does not make any sense even from performance point of view.

What the reasoning behind the logic of using int instead of long for indexing and missing billions of bits?

PS One can say that the problem is 2 GiB of heap size, but today it is not an issue anymore.

The documentation ofjava.util.BitSet states:

The bits of a BitSet are indexed by nonnegative integers.

This is what it is supposed to do, so no long indexes needed.

That it's internal data structure could support more than 2^31 individual bits is an implementation detail that has no relevance for the public interface of the class (they could have used a boolean[] array and the class would still work, albeit with a bigger memory footprint and more runtime for some methods.)


The question remains: will the public interface of this class change to support long indexes?

This is highly unlikely, because supporting long indexes would mean that methods like

  • int cardinality()
  • int nextClearBit() (and similar methods: next/previous clear/set bit)
  • int size()
  • IntStream stream()

would also need to be changed, which would break existing code.

The only way I can think of a BitSet like class with long indexes would be an additional class BigBitSet (or LongBitSet or whatever you like) so that people needing bitsets with more then 2^31 bits could switch to that new class.

Whether such a class would ever be added to the java.util package is another question - for that you would have to convince the JCP executive board that this is a important addition / gaping hole in the current Java ecosystem.

Each chunk of 64 bits is packed into long, not one long per bit index, so length of the long[] words array will use up to 268,435,456 bytes with int index when calling set(2147483647) or just one long if calling only bitset.set(1). Example in jshell:

BitSet b = new BitSet();
b.size();
==> 64 (ie words is length 1 can store 64 bits)
b.set(1);
b.size();
==> 64 (ie words is still length 1)
b.set(64)
==> 128 (ie words array is length 2, can store up to 128 bits)

Usually you use bit sets to index into something else. Let's say you use this bitset to index into an array.

BitSet b = new BitSet();
b.set(2147483647);
ArrayList<X> items = new ArrayList<X>();
// ...add a looot of elements to the ArrayList...
// then:
X item = items.get(b.nextSetBit(0));

To make this work, the array list must contain 2,147,483,648 elements, and it would at least use 2GB of RAM (assuming each element requires at least 1 byte of storage), which would crash Java.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM