简体   繁体   中英

Why String Deduplication when we have String Pool

String De-duplication :

Strings consume a lot of memory in any application.Whenever the garbage collector visits String objects it takes note of the char arrays. It takes their hash value and stores it alongside with a weak reference to the array. As soon as it finds another String which has the same hash code it compares them char by char.If they match as well, one String will be modified and point to the char array of the second String. The first char array then is no longer referenced anymore and can be garbage collected.

String Pool:

All strings used by the java program are stored here. If two variables are initialized to the same string value. Two strings are not created in the memory, there will be only one copy stored in memory and both will point to the same memory location.

So java already takes care of not creating duplicate strings in the heap by checking if the string exists in the string pool. Then what is the purpose of string de-duplication?

If there is a code as follows

    String myString_1 = new String("Hello World");
    String myString_2 = new String("Hello World");

two strings are created in memory even though they are same. I cannot think of any scenario other than this where string de-duplication is useful. Obviously I must be missing something. What I am I missing?

Thanks In Advance

The string pool applies only to strings added to it explicitly, or used as constants in the application. It does not apply to strings created dynamically during the lifetime of the application. String deduplication, however, applies to all strings.

String de-duplication enjoys the extra level of indirection built into String :

  • With a string pool, you are limited to returning the same object for two identical strings
  • String de-duplication lets you have multiple distinct String objects sharing the same content .

This translates into removing a limitation of de-duplicating on creation: your application could keep creating new String objects with identical content while using very little extra memory, because the content of the strings would be shared. This process can be done on a completely unrelated schedule - for example, in the background, while your application does not need much of the CPU resources. Since the identity of the String object does not change, de-duplication can be completely hidden form your application.

Compile time vs run time

String pool refers to string constants that are known at compile time.

String deduplication would help you if you happen to retrieve (or construct) the same string a million times at run time, eg reading it from a file, a HTTP request or any other way.

Just to add to the answers above, on older VM's the string pool is not garbage collected (this has changed now, but don't rely on that). It contains strings which are used as constants in the application, and so will always be needed. If you continually put all your strings in the string pool, you might quickly run out of memory. On top of that, de-duplication is a relatively expensive process, if you know you only need the string for a very short period of time, and you have enough memory.

For these reasons, strings are not put in the string pool automatically. You have to do it explicitly by calling string.intern() .

I cannot think of any scenario other than this where string de-duplication is useful.

Well one other (much more) frequent scenario is the use of StringBuilder s. In the toString() method of the StringBuilder class, it clearly creates a new instance in memory:

public final class StringBuilder extends AbstractStringBuilder
                                 implements java.io.Serializable, CharSequence
{
    ...

    @Override
    public String toString() {
       // Create a copy, don't share the array
       return new String(value, 0, count);
    }

    ...

}

Same thing for its thread-safe version StringBuffer :

public final class StringBuffer extends AbstractStringBuilder
                                implements java.io.Serializable, CharSequence
{
   ...

   @Override
   public synchronized String toString() {
       if (toStringCache == null) {
           toStringCache = Arrays.copyOfRange(value, 0, count);
       }
       return new String(toStringCache, true);
   }

   ...
}

In applications that rely heavily on this, string de-duplication may reduce memory usage.

From documentation :

"Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable."

So my sense says, this constructor in String class is not needed normally like you have used above. I guess that constructor is provided merely for the sake of completeness or if you do not want to share that copy (kind of unnecessary now, refer here what I am talking about) but still other constructors are useful like getting an String object from char array and so on..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM