简体   繁体   中英

String interning and HashSet in java

I have read about string interning, in which String literals are reused, whereas String object created using new aren't reused. This can be seen below when I print true and false for their equality. To be specific, (p1==p2)!=p3 , So there are two objects, one pointed by p1 and p2 and another by p3 . However, when I add them to HashSet , all considered same. I was expecting a.size() to return 2 , but it returned 1 . Why is this so?

package collections;

import java.util.HashSet; 

public class Col {
    public static void main(String[] args) {
        method1();
    }

    public static void method1()
    {
        HashSet a = new HashSet();
        String p1 = "Person1";
        String p2 = "Person1";
        String p3 = new String("Person1");

        if(p1 == p2)
            System.out.println(true);
        else
            System.out.println(false);


        if(p1 == p3)
            System.out.println(true);
        else
            System.out.println(false);

        a.add(p1);
        a.add(p2);
        a.add(p3);

        System.out.println(a.size());
    }
}

Output

true
false
1

HashSet uses equality to keep a unique set of values, not identity (ie, if two objects are equals to each other, but not == , a HashSet will only keep one of them).

You can implement a set that uses identity instead of equality by using the JDK's IdentityHashMap with a dummy value shared between all keys, in a similar way that HashSet is based on HashMap .

I have read about string interning, in which String literals are reused, whereas String object created using new aren't reused. This can be seen below when I print true and false for their equality. To be specific, (p1==p2)!=p3, So there are two objects, one pointed by p1 and p2 and another by p3. However, when I add them to HashSet, all considered same. I was expecting a.size() to return 2, but it returned 1.

This is right only if you compare String using == , the result is different when comparing using equals() method. (In doubt, you can test).

When adding into HashSet , the comparison method used is equals() as its proper for objects. And so, p1 , p2 and p3 are equals.

You can try testing using equals() it will output true , true , 1 instead of true , false , 1

p1 and p2 are string literals and they are pointing to the same value because of string pool. So, when we are comparing them using == then they are matching.

p3 is a string object, so when we match using == then it tries to match using reference, so it gives false.

HashSet's add method call HashMap's put method internally. HashMap's put method use hashCode and equals method to set the value in HashMap. String implement hashCode and equals method and provide same hashCode for same value. HashSet contain unique value, so it store only one value.

This is one of those cases where I would recommend learning how to use javap to understand how your code is compiled but let me try to explain what is going on under the hood.

When Java compiles that class, it creates instructions for building what is called the constant pool for that class. That constant pool will hold a reference to a string with the value "Person1" . The compiled logic will also say p1 and p2 's value should be set to the constant pool's reference to that string (the address in memory that it lives in). Calling p1==p2 will return true because they literally have the same exact value. When you call String p3 = new String("Person1"); you are telling Java to create a new string in a different place in memory which is merely a copy of the original one and then set p3 's value as a reference to the place in memory that the new string object lives in. So if you call p1 == p3 it will return false because what you are saying is "does p1 's location in memory equals p2 's location in memory?"

As others have pointed out, if you called p1.equals(p3) it returns true because .equals compares the string values instead of the references . And a HashSet will see them all the same because it uses the method .hashCode which is similar to .equals in the sense that it generates a hash off of the string value .

Hopefully that clears up some of the confusion!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM