According to the BitSet
implementation, it internally uses an array of longs:
/**
* The internal field corresponding to the serialField "bits".
*/
private long[] words;
But for the set
method it uses int:
public void set(int bitIndex) {...}
So basically we can store (2^31 - 1) * 64 * 8 = 2,147,483,642 * 64 * 8 = 137,438,953,088 bits
, but using int
indexing we have access only to the first 2,147,483,648
bits.
Which means that 137,438,953,088 - 2,147,483,648 = 135,291,469,440
bits are unavailable.
But if developers of this class used long
instead of int
for bits indexing, it would solve all the problems, since with long
we can navigate trough 2^63 - 1 = 9,223,372,036,854,775,807 bits
It does not make any sense even from performance point of view.
What the reasoning behind the logic of using int
instead of long
for indexing and missing billions of bits?
PS One can say that the problem is 2 GiB
of heap size, but today it is not an issue anymore.
The documentation ofjava.util.BitSet
states:
The bits of a
BitSet
are indexed by nonnegative integers.
This is what it is supposed to do, so no long
indexes needed.
That it's internal data structure could support more than 2^31 individual bits is an implementation detail that has no relevance for the public interface of the class (they could have used a boolean[]
array and the class would still work, albeit with a bigger memory footprint and more runtime for some methods.)
The question remains: will the public interface of this class change to support long
indexes?
This is highly unlikely, because supporting long
indexes would mean that methods like
int cardinality()
int nextClearBit()
(and similar methods: next/previous clear/set bit) int size()
IntStream stream()
would also need to be changed, which would break existing code.
The only way I can think of a BitSet
like class with long
indexes would be an additional class BigBitSet
(or LongBitSet
or whatever you like) so that people needing bitsets with more then 2^31 bits could switch to that new class.
Whether such a class would ever be added to the java.util
package is another question - for that you would have to convince the JCP executive board that this is a important addition / gaping hole in the current Java ecosystem.
Each chunk of 64 bits is packed into long, not one long per bit index, so length of the long[] words array will use up to 268,435,456 bytes with int index when calling set(2147483647) or just one long if calling only bitset.set(1). Example in jshell:
BitSet b = new BitSet();
b.size();
==> 64 (ie words is length 1 can store 64 bits)
b.set(1);
b.size();
==> 64 (ie words is still length 1)
b.set(64)
==> 128 (ie words array is length 2, can store up to 128 bits)
Usually you use bit sets to index into something else. Let's say you use this bitset to index into an array.
BitSet b = new BitSet();
b.set(2147483647);
ArrayList<X> items = new ArrayList<X>();
// ...add a looot of elements to the ArrayList...
// then:
X item = items.get(b.nextSetBit(0));
To make this work, the array list must contain 2,147,483,648 elements, and it would at least use 2GB of RAM (assuming each element requires at least 1 byte of storage), which would crash Java.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.