简体   繁体   中英

TreeSet: number of elements less than a value efficiently

I need a way to calculate the number of elements less than X in a TreeSet of Integers really fast.

I can use the

  • subSet()
  • headSet()
  • tailSet()

methods but they are really slow (I just need the count, not the numbers themselves). Is there a way?

Thank you.


EDIT:

I found a workaround that makes things a lot faster! I am using BitSet and it's cardinality() method. I create a BitSet at first and for every element added to the TreeSet I set the corresponding index in BitSet. Now, to count the number of elements less than XI use:

bitset.get(0, X+1).cardinality()

This is much faster compared with treeset.subSet(0, true, X, true).size().

Anyone knows why? I assume BitSet.cardinality() doesn't use linear search.

How fast does 'really fast' need to be? Roughly how many elements do you have?

subSet()/headSet()/tailSet() are O(1) because they return a view of the original treeset, but if you size() your subSet() you are still iterating over all the original elements, hence O(N).

Are you using Java 8? This will be about the same but you can parallelise the cost.

Set<Integer> set = new TreeSet<>();
// .. add things to set

long count = set.parallelstream().filter(e -> e < x).count();

NB EDIT

With further exploration and testing I cannot substantiate the claim "if you size() your subSet() you are still iterating over all the original elements". I was wrong. parallelstream().count() on this 4 core machine was ~30% slower than subSet().size()

If you don't update the data structure, just keep the number of elements less than X in a hashmap!

If you update it not frequently, keep a sorted linked list of numbers. At insert/remove, add/remove from list in O(1) and update the hashmap (O(n)).

You can have O(Log(n)) get and O(Log(n)) update, by using a (sorted) binary tree. In each element of the tree, also keep the number of its descendants. Now to get # items < than y, you find it in the binary tree, but also sum the number of elements whenever you go right instead of left. At update you need to update the ancestors of the new element too.

By the way, if you are willing to accept approximate answers, there could be faster ways too.

Since all answers so far point to data structures different than Java's TreeSet , I would suggest the Fenwick tree, which has O(log(N)) for updates and queries; see the link for Java implementation.

package ArrayListTrial;

import java.util.Scanner;

public class countArray {

    public static void main(String[] args) {
        // TODO Auto-generated method stub

        int[] array = new int[100];
        Scanner scan = new Scanner(System.in);
        System.out.println("input the number you want to compare:");
        int in = scan.nextInt();
        int count = 0;
        System.out.println("The following is array elements:");
        for(int k=0 ; k<array.length ; k++)
        {
            array[k] = k+1;
            System.out.print(array[k] + " ");
            if(array[k] > in)
            {
                count++;
            }
        }
        System.out.printf("\nThere are %d numbers in the array bigger than %d.\n" , count , in);

    }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM