简体   繁体   中英

How sets avoid duplicates internally?

I had a doubt about the set in Collections framework. How the set itself will identify duplicates and how it will come to know? Could anyone please explain how it is implemented? How hashcode and equals method will come into the picture? I need a brief explanation as it is really important for me.

It roughly works like this

if (!collection.contains(element))
    collection.add(element);

And the contains method, would use equals/hashcode.

In TreeSet, the elements are stored in a Red-Black Tree, whereas HashSet, uses a HashMap.

Infact, the way it is added to the container is specific to the element (the spot on the tree, bucket in the hashtable), thus the adding itself uses equals/hashcode.

This is explained in the javadoc for Set .

A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.

HashSet使用hashcode()解析对象应该去的bucket和equals()方法来检查位于该bucket上的对象的相等性

The actual implementation depends on the container. HashMap lookup the item bucked given its hashCode then test the inserted object and the stored ones by using equals (this is one of the reasons for requiring that a.equals(b) iff b.equals(a) ).

TreeMap , on the other hand, relies on the result of the compareTo method (if the element implements Comparable or the compare method implemented by a Comparator ). If compare returns 0, the elements are regarded as "equals". (Note that compareTo should be consistent with equals , ie a.compareTo(b)==0 iff a.equals(b) ).

"Can u please explain with these example. s.add("123");s.add("123");"

For the above query explained in context with Set interface, Please refer to the below snippet and explanation.

public void setTest() {
    Set<String> obj = new HashSet<>();
    System.out.println(obj.add("123")); //Output : true
    System.out.println(obj.add("123")); //Output : false
}

If you notice in the above snippet, we have added 123 two times. for the first time add SOP will return true. then for the second time of added "123", SOP will return false.

With this we can understand that, if we add same value in the Set for the second time or more, then the duplicated value will be skipped .

Basically set is an interface which has many different implementations, let's take HashSet implementation for now, to answer you question, I downloaded the source code and went inside the HashSet class, then I searched add method and saw that it uses HashMap to store unique values. It uses the value to be stored as key of a HashMap and the corresponding value of key (ie PRESENT in below code snippet) as a constant value(this value is a dummy value), we all know that keys of map are unique. So that is how it works. Code below:

private static final Object PRESENT = new Object();
public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM