简体   繁体   中英

How does the search algorithm work with objects in a java collection such as HashSet?

The question really is regarding objects that change dynamically in a collection. Does the "contains" method go and compare each of the object individually every time or does it do something clever?

If you have 10000 entries in a collection, I would have expected it to work a bit more cleverly but not sure. Or if not is there a way to optimise it by adding a hook that would tell the collection object to update hashcodes for the objects that have changed??

Additional Question:

Thanks for answers below... Can I also ask what happens in case of ArrayList? I could not find anything in the documentation that says not to put mutable objects in ArrayList. Does that mean the search algorithm simply goes and compares against hashcode of each object??

They hash the object and look it up by its hash code. If it is there, it will compare the objects themselves. This is because two or more objects that have the same hash might not be the same object.

Since Java's hash collections use buckets (chaining), they have to look at all the objects in the bucket. These objects are kept in a linked list (not java.util.LinkedList , but a custom list)

This is generally very efficient, and the HashSet.contains() method is amortized O(1) (constant time).


Java's docs have an answer to the second part of your question:

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

A HashSet computes the hash code of an element when it's added to the set. It stores this in a way which makes it very efficient to find all elements with the same hash code.

Then when you call contains() , it simply has to compute the hash code of the value you're looking for, and find all elements in the set with the same hash code. There may be multiple elements as hash codes aren't unique, but there are likely to be far fewer elements with matching hash codes than there are elements within the set itself. Each matching element is then checked with equals until either a match is found or we've run out of candidates.

EDIT: To answer the second part, which somehow I'd missed on first reading, you won't be able to find the element again. You mustn't change an element used as a key in a hash table or an element in a hash set in any equality-affecting manner, or you will basically break things.

The simple answer is — no, nothing clever happens. If you expect an object's state to change in a way that affects its hashCode() and equals(...) behavior, then you must not store it in a HashSet , nor any other Set . To quote from http://download.oracle.com/javase/6/docs/api/java/util/Set.html :

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

A HashSet uses a HashMap under the hood. Therefore, the contains operation uses the hashCode() method in the object to check if it's present in the hash table implemented by HashMap .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM