简体   繁体   中英

When is using a TreeSet faster than a HashSet?

I have been doing some reading on this topic, so far from what I understand for Adding, Removing and Search operations, HashSet is faster with O(1) time complexity while TreeSet gets O(log n) for the same operations. When iterating through the elements both HashSet and TreeSet have the time complexity of O(n).

So what is a use case when TreeSet is faster than HashSet?

In general you can best compare capabilities of Java container classes by looking at the interfaces they implement. Checking the HashSet javadoc , you'll see it has Iterable<E>, Collection<E>, Set<E> . TreeSet has Iterable<E>, Collection<E>, NavigableSet<E>, Set<E>, SortedSet<E> .

So the difference is SortedSet and NavigableSet . Those are methods TreeSet offers and HashSet doesn't. If in turn you look up their javadoc, you'll find a range of behaviors organized to exploit the ordering of elements in a TreeSet. HashSets have no concept of element ordering. That's the main difference.

In practice there aren't too many use cases where the difference between the O(1) expected time of HashSet performance and the O(log n) guaranteed time of TreeSet turns out to be important.

When it is important, you still need to account for the the expected nature of the hash performance, since any given add() can require O(n) time to expand the internal bucket array and rehash all the contents. In some applications, this is a killer. For example, your game normally runs like lightning, but once in a while there's a stutter while a 10 Mb hash set is grown to 20 Mb. TreeSet's performance doesn't have such grand performance quirks. Eg reorganizing a red-black tree adds only a very small constant factor to performance of simple BST operations, and that factor is always about the same.

The added value of TreeSet is not the complexity but the type of the data structure. The complexity of hashset is better than that of treeset in any case, except in the case of an iteration, they have the same complexity.

HashSet: the add, remove, and contains methods has constant time complexity o(1).

TreeSet: the add, remove, and contains methods has time complexity of o(log (n)).

But to solve some problems TreeSet is more suitable so your program performance will be better than if you used HashSet.

Of the actual methods that are both defined on TreeSet and HashSet , none are reliably faster on TreeSet . There are additional methods on TreeSet that couldn't be implemented as efficiently on HashSet , so they aren't -- methods such as floor and ceiling .

TreeSet is faster that HashSet in some use-cases where ordering is relevant to the task that you are performing.

For example, if I have a mutable set of strings and I frequently want to find the next string in the set that is greater or equal to a given string.

  • With a HashSet I have to iterate the entire set to find the string ... in the case where the given string is not in the set. That is O(N) .
  • With a TreeSet I can use ceiling to find the required string in O(logN) .

Another example, if I want to iterate the strings set in order , that is O(N) for a TreeSet . For a HashSet I have to extract the strings to an array, sort the array, and iterate that. All in all that is O(NlogN) .


Caveat: complexity and performance are not the same thing. For instance, an O(N) solution can be faster than an O(NlogN) solution when N is relatively small. And ... in theory ... HashSet operations are no longer O(1) when the set size exceeds 2 31 , because the standard HashSet implementation uses a Java array as the hash array.

Technically, the two can't be fairly compared. A HashSet implements Set, while a TreeSet implements NavigableSet, which has extra functionality based around the concept of its elements (although there is no requirement for the implementation to actually order them).

A HashSet is faster (O(1) vs O(log n) than a TreeSet for all Set methods.

A TreeSet offers NavigableSet methods that are O(log n), which are “faster” only because they exist.

A TreeSet also iterates over its elements in Comparable order.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM