简体   繁体   中英

Java String Immutability and Using same string value to create a new string

I know that the title of the question is not very clear, sorry about that, did not know how to put it up. I have a very basic java implementation question which I want to focus on application performance, but it also involves String creation pattern in java.

I understand the immutability concept of Strings in Java. What I am not sure about is that, I have read somewhere that the following will not make two different String objects:

String name = "Sambhav";
String myName= "Sambhav";

I want to know how does Java do that? Does it actually look for a String value in the program memory and check for its existence and if it does not exist then creates a new String object? In that case obviously it is saving memory but there are performance issues.

Also lets say I have a code like this:

  public void some_method(){
        String name = "Sambhav";
        System.out.println(name); //  or any random stufff
  }

Now on each call of this function, is there a new String being made and added to memory or am I using the same String object? I am just curious to know about the insights of how all this is happening?

Also if we say that

String name = "Sambhav";
String myName= "Sambhav";

will not create a new object because of reference, what about

String name = new String("Sambhav");
String myName= new String("Sambhav");

Will Java still be able to catch that the string are the same and just point myName to the same object as created in the previous statement?

Strings are internally char arrays with some inherent capabilities to work with the underlying char array. Eg. subString(int), split(String) methods.

Strings are immutable which means any effort made to change a String reference create a new String and allocate memory for that. As below

line 1. String a = new String("SomeString");
line 2. a = "SomeStringChanged";

line 1 allocate memory with "SomeString" referenced by variable a and add " SomeString " to String Pool

line 2 allocate memory in String Pool with "SomeStringChanged" and referenced by aie a is not pointing to "SomeString" now and memory occupied by "SomeString" is available for gc now.

No reuse here

line 3. String  b =  "SomeStringChanged";

Now the literal " SomeStringChanged " is reused by variable a and b . ie they are referring to the same memory location , in fact to a location called the ' String Pool '.

line 4. a = new String("SomeStringChanged");

Now a new allocation is done which contains " SomeStringChanged " and referenced by a

There is no reuse happening now. (the char array SomeStringChanged is already there in String Pool. So no String Pool allocation happen)

line 5. a = new String("SomeStringChanged").intern();

Now the allocation created during line 4 is discarded and variable a and b are referring to same location in the String Pool which contains "SomeStringChanged". There is reuse of the same char array here. The credit goes to intern() method

line 6. String x = new String("SomeX");
line 7. String y = "SomeX";

Line 6 will create an allocation for SomeX in the heap and in String Pool. The char array is duplicated.

Line 7 will not allocate any memory for SomeX since its already there in the String Pool

Line 8 String s = new String(someStringVariable);

Line 8 will only allocate single memory location in the heap and not in the String Pool.

In conclusion the reuse of a char array of string is only possible if a String reference is declared as a literal or the String object is interned ie Only these two can make use of a String pool (which is in fact the idea behind char array reuse).

String that you put in quotes in you source files "like that" are compile-time constants and in case their contents match they are represented by a single entry in a constant pool inside your class's byte-code representation and thus represent a single String object at run-time.

String name = new String("Sambhav");
String myName= new String("Sambhav");

Those are different Objects explicitly, a new String Object will created for each call, though it could reuse char array of the underlying string (the one you provide in constructor). This happens due to new keyword that envisages Java to create a new object. And that is why name != myName in that case, even though name.equals(myName)

String name = new String("Sambhav");

String myName = new String("Sambhav");

Will Java still be able to catch that the string are the same and just point myName to the same object as created in the previous statement?

The JVM manages to keep only one reference of equal String objects by computing a hash .

Those String objects are kept in a String pool .

String pooling

String pooling (sometimes also called as string canonicalisation) is a process of replacing several String objects with equal value but different identity with a single shared String object.

You can achieve this goal by keeping your own Map<String, String> (with possibly soft or weak references depending on your requirements) and using map values as canonicalised values.

Or you can use String.intern() method which is provided to you by JDK.

Quick string pool differences by JVM version

In Java 6, this String pool was located in the Perma Gen memory. This memory is usually small and limited. Also, here the String.intern() shouldn't be used because you can run out of memory.

In Java 7 and 8 it was taken out to the heap memory and implemented with a hash-table like data structure.

Since hash-table like structures ( HashMap , WeakHashMap ) use a computed hash to access the entry in constant complexity, the entire process is very fast.

As mentioned in this article:

  • Stay away from String.intern() method on Java 6 due to a fixed size memory area (PermGen) used for JVM string pool storage.

  • Java 7 and 8 implement the string pool in the heap memory. It means that you are limited by the whole application memory for string pooling in Java 7 and 8.

  • Use -XX:StringTableSize JVM parameter in Java 7 and 8 to set the string pool map size. It is fixed, because it is implemented as a hash map with lists in the buckets. Approximate the number of distinct strings in your application (which you intend to intern) and set the pool size equal to some prime number close to this value. It will allow String.intern() to run in the constant time and requires a rather small memory consumption per interned string (explicitly used Java WeakHashMap will consume 4-5 times more memory for the same task).

  • The default value of -XX:StringTableSize parameter is 1009 in Java 7 and around 25-50K in Java 8.

You are actually showing 3 different reasons why the Strings may use the same buffer internally. Note that sharing a buffer is only possible for separate instances because they are immutable; otherwise changes in the buffer would be reflected in the other variable values as well.

  1. Compiler detects identical String literals; if the string literal is repeated the compiler may simply point to the same object instance;

  2. References to a String are pointing to the same object instance and are therefore identical by definition;

  3. Buffer sharing may help during construction with new . If the runtime system sees that String contents may be shared then it may opt to do so; this behavior is however not guaranteed - it's implementation specific. The object instances should be different (but using them as separate instances would still not be wise).

As an example for #3, Java 6 OpenJDK source simply will point to the same buffer. If the buffer is larger than the new String instance, a copy will be created. Those are different Objects explicitly, a new String Object will created for each call, though it could reuse char array of the underlying string (the one you provide in constructor) so that the Garbage Collector can clear the larger string (otherwise the larger character buffer may be kept in memory indefinitely).

This all should not matter too much to you, unless you get careless and start using == for equality (or other constructs that confuse == with equals ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM