简体   繁体   中英

What is the space complexity of bitset in this scenario

I am doing a leetcode problem where I have to find the duplicate of an array of size [1-N] inclusive and came upon this solution:

    public int findDuplicate(int[] nums) {
        BitSet bit = new BitSet();
        for(int num : nums) {
            if(!bit.get(num)) {
                bit.set(num);
            } else {
                return num;
            }
        }
        return -1;
    }

The use of bitset here im assuming is similar to using boolean[] to keep track if we saw the current number previously. So my question is what the space complexity is for this? The runtime seems to be O(n) where n is the size of the input array. Would the same be true for the space complexity?

Link to problem: https://leetcode.com/problems/find-the-duplicate-number/

Your Bitset creates an underlying long[] to store the values. Reading the code of Bitset#set , I would say it's safe to say that the array will never be larger than max(nums) / 64 * 2 = max(nums) / 32 . Since long has a fixed size, this comes down to O(max(nums)) . If nums contains large values, you can do better with a hash map.

I'm trying this out with simple code, and it seems to corroborate my reading of the code.

BitSet bitSet = new BitSet();

bitSet.set(100);
System.out.println(bitSet.toLongArray().length); // 2 (max(nums) / 32 = 3.125)

bitSet.set(64000);
System.out.println(bitSet.toLongArray().length); // 1001 (max(nums) / 32 = 2000)

bitSet.set(100_000);
System.out.println(bitSet.toLongArray().length); // 1563 (max(nums) / 32 = 3125)

Note that the 2 factor I added is conservative, in general it will be a smaller factor, that's why my formula consistently over-estimates the actual length of the long array, but never by more than a factor of 2. This is the code in Bitset that made me add it:

private void ensureCapacity(int wordsRequired) {
    if (words.length < wordsRequired) {
        // Allocate larger of doubled size or required size
        int request = Math.max(2 * words.length, wordsRequired);
        words = Arrays.copyOf(words, request);
        sizeIsSticky = false;
    }
}

In summary, I would say the bit set is only a good idea if you have reason to believe you have smaller values than you have values (count). For example, if you have only two values but they are over a billion in value, you will needlessly allocate an array of several million elements.

Additionally, even in cases where values remain small, this solutions performs poorly for sorted arrays because Bitset#set will always reallocate and copy the array, so your complexity is not linear at all, it's quadratic in max(nums) , which can be terrible if max(nums) is very large. To be linear, you would need to first find the maximum, allocate the necessary length in the Bitset , and then only go through the array.

At this point, using a map is simpler and fits all situations. If speed really matters, my bet is that the Bitset will beat a map under specific conditions (lots of values, but small, and by pre-sizing the bit set as described).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM