简体   繁体   中英

Java BitSet: finding all the true bits efficiently?

Suppose the BitSet fromjava.util.BitSet; is used. The objective is to quickly find all the bit values that are set to true . These values are not in order and without a particular pattern. The maximum index of the BitSet will be 2^31 - 48 . The total number of bits that will be set to true is (2^31 - 48)/2 . In other words, there are two billion bits that can be true / false , how can I find all the true bits efficiently?

Each time a bit is set to true , a run is required to visit all the true bits in the BitSet . You can see why looping through all the 2^31 - 48 bits every time isn't as efficient when it comes to performance.

Here is a solution that doesn't fit my need: create an int[] indices of size (2^31 - 48)/2 and every time a bit i is set to true , store the value i in the next available slot in indices . While this helps in achieving the request, it would add about 32 * (2^31 - 48)/2 bits into memory which is around 4.3Gb .

The focus in on performance and repetitive computation. Using input/output files or something other than BitSet is not desired.

What is the fastest approach to achieve the desired behaviour? Or... what is a sufficiently quick approach that also uses significantly less memory?

What is the fastest approach to achieve the desired behaviour?

If you are restricting yourself to the BitSet API, then I think you need a loop that repeatedly calls BitSet.nextSetBit . Yes, that is going to entail 2^30 calls. But I think it as good as you are going to get using the BitSet API.

If you want something faster you will either need to invent your own data structure to do this (and I don't have any really good ideas), or change the problem .

Observation: examining 2^30 bits each time one bit changes is going to be massively computationally expensive no matter how you do it.

If this was my problem, I would first look for a smarter solution that avoided having to do that at all. If there was no smart solution, I would probably use an array of int instead of a BitSet and figure out a way to parallelize the scan across 8 / 16 / 32 cores 1 . (But it also depends what you need to do for each bit that is true .)


1 - This assumes that you have idle cores / power / cooling to throw at this problem.


Or... what is a sufficiently quick approach that also uses significantly less memory?

AFAIK, you can't represent 2^N random true / false values in better that O(2^N) bits. Your only hope would be if the bit pattern was non-random and easily compressible. And even then you have the problems of the CPU cost of compression / decompression, and the problem of efficiently updating a bit in the compressed bit sequence. Whether this is feasible would depend on the nature of your bit stream.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM