简体   繁体   English

什么是 Java 字符串实习?

[英]What is Java String interning?

什么是 Java 中的字符串实习,我应该何时使用它,为什么

http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern() http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern()

Basically doing String.intern() on a series of strings will ensure that all strings having same contents share same memory.基本上对一系列字符串执行 String.intern() 将确保所有具有相同内容的字符串共享相同的内存。 So if you have list of names where 'john' appears 1000 times, by interning you ensure only one 'john' is actually allocated memory.因此,如果您有 'john' 出现 1000 次的姓名列表,通过实习,您可以确保只有一个 'john' 实际分配了内存。

This can be useful to reduce memory requirements of your program.这对于减少程序的内存需求很有用。 But be aware that the cache is maintained by JVM in permanent memory pool which is usually limited in size compared to heap so you should not use intern if you don't have too many duplicate values.但请注意,缓存由 JVM 在永久内存池中维护,与堆相比,永久内存池的大小通常有限,因此如果您没有太多重复值,则不应使用实习生。


More on memory constraints of using intern()更多关于使用 intern() 的内存限制

On one hand, it is true that you can remove String duplicates by internalizing them.一方面,确实可以通过内部化来删除 String 重复项。 The problem is that the internalized strings go to the Permanent Generation, which is an area of the JVM that is reserved for non-user objects, like Classes, Methods and other internal JVM objects.问题是内部化的字符串进入永久代,这是 JVM 的一个区域,为非用户对象保留,如类、方法和其他内部 JVM 对象。 The size of this area is limited, and is usually much smaller than the heap.这个区域的大小是有限的,通常比堆小很多。 Calling intern() on a String has the effect of moving it out from the heap into the permanent generation, and you risk running out of PermGen space.对 String 调用 intern() 具有将其从堆移出到永久代的效果,并且可能会耗尽 PermGen 空间。

-- From: http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html -- 来自: http : //www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html


From JDK 7 (I mean in HotSpot), something has changed.从 JDK 7(我的意思是在 HotSpot 中)开始,发生了一些变化。

In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application.在 JDK 7 中,interned 字符串不再分配在 Java 堆的永久代中,而是与应用程序创建的其他对象一起分配在 Java 堆的主要部分(称为年轻代和年老代)中. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted.此更改将导致更多数据驻留在主 Java 堆中,而永久代中的数据更少,因此可能需要调整堆大小。 Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.由于此更改,大多数应用程序只会看到相对较小的堆使用差异,但加载许多类或大量使用 String.intern() 方法的较大应用程序将看到更显着的差异。

-- From Java SE 7 Features and Enhancements -- 来自Java SE 7 的特性和增强

Update: Interned strings are stored in main heap from Java 7 onwards.更新:从 Java 7 开始,实习字符串存储在主堆中。 http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes

There are some "catchy interview" questions, such as why you get equals!有一些“引人入胜的面试”问题,比如你为什么得到平等! if you execute the below piece of code.如果您执行以下代码。

String s1 = "testString";
String s2 = "testString";
if(s1 == s2) System.out.println("equals!");

If you want to compare Strings you should use equals() .如果你想比较字符串,你应该使用equals() The above will print equals because the testString is already interned for you by the compiler.上面将打印 equals,因为testString已经由编译器为您实习 You can intern the strings yourself using intern method as is shown in previous answers....您可以使用实习生方法自己实习字符串,如之前的答案所示....

JLS JLS

JLS 7 3.10.5 defines it and gives a practical example: JLS 7 3.10.5 对其进行了定义并给出了一个实际示例:

Moreover, a string literal always refers to the same instance of class String.此外,字符串字面量始终引用类 String 的同一个实例。 This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.这是因为字符串文字 - 或者更一般地说,作为常量表达式(第 15.28 节)的值的字符串 - 使用 String.intern 方法被“嵌入”以便共享唯一实例。

Example 3.10.5-1.例 3.10.5-1。 String Literals字符串文字

The program consisting of the compilation unit (§7.3):由编译单元(§7.3)组成的程序:

 package testPackage; class Test { public static void main(String[] args) { String hello = "Hello", lo = "lo"; System.out.print((hello == "Hello") + " "); System.out.print((Other.hello == hello) + " "); System.out.print((other.Other.hello == hello) + " "); System.out.print((hello == ("Hel"+"lo")) + " "); System.out.print((hello == ("Hel"+lo)) + " "); System.out.println(hello == ("Hel"+lo).intern()); } } class Other { static String hello = "Hello"; }

and the compilation unit:和编译单元:

 package other; public class Other { public static String hello = "Hello"; }

produces the output:产生输出:

 true true true true false true

JVMS虚拟机

JVMS 7 5.1 says says that interning is implemented magically and efficiently with a dedicated CONSTANT_String_info struct (unlike most other objects which have more generic representations): JVMS 7 5.1 说实习是通过专用的CONSTANT_String_info结构神奇而有效地实现的(与大多数其他具有更通用表示的对象不同):

A string literal is a reference to an instance of class String, and is derived from a CONSTANT_String_info structure (§4.4.3) in the binary representation of a class or interface.字符串字面量是对 String 类实例的引用,从类或接口的二进制表示中的 CONSTANT_String_info 结构(第 4.4.3 节)派生而来。 The CONSTANT_String_info structure gives the sequence of Unicode code points constituting the string literal. CONSTANT_String_info 结构给出了构成字符串文字的 Unicode 代码点序列。

The Java programming language requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class String (JLS §3.10.5). Java 编程语言要求相同的字符串文字(即包含相同代码点序列的文字)必须引用类 String 的相同实例(JLS §3.10.5)。 In addition, if the method String.intern is called on any string, the result is a reference to the same class instance that would be returned if that string appeared as a literal.此外,如果在任何字符串上调用 String.intern 方法,则结果是对同一类实例的引用,如果该字符串作为文字出现,则将返回该类实例。 Thus, the following expression must have the value true:因此,以下表达式的值必须为 true:

 ("a" + "b" + "c").intern() == "abc"

To derive a string literal, the Java Virtual Machine examines the sequence of code points given by the CONSTANT_String_info structure.为了派生字符串文字,Java 虚拟机检查由 CONSTANT_String_info 结构给出的代码点序列。

  • If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String.如果之前在类 String 的实例上调用了方法 String.intern,该实例包含与 CONSTANT_String_info 结构给出的相同的 Unicode 代码点序列,则字符串文字派生的结果是对同一个类 String 实例的引用。

  • Otherwise, a new instance of class String is created containing the sequence of Unicode code points given by the CONSTANT_String_info structure;否则,将创建一个新的 String 类实例,其中包含由 CONSTANT_String_info 结构给出的 Unicode 代码点序列; a reference to that class instance is the result of string literal derivation.对该类实例的引用是字符串文字派生的结果。 Finally, the intern method of the new String instance is invoked.最后,调用新 String 实例的 intern 方法。

Bytecode字节码

Let's decompile some OpenJDK 7 bytecode to see interning in action.让我们反编译一些 OpenJDK 7 字节码,看看实习的效果。

If we decompile:如果我们反编译:

public class StringPool {
    public static void main(String[] args) {
        String a = "abc";
        String b = "abc";
        String c = new String("abc");
        System.out.println(a);
        System.out.println(b);
        System.out.println(a == c);
    }
}

we have on the constant pool:我们在常量池上有:

#2 = String             #32   // abc
[...]
#32 = Utf8               abc

and main :main

 0: ldc           #2          // String abc
 2: astore_1
 3: ldc           #2          // String abc
 5: astore_2
 6: new           #3          // class java/lang/String
 9: dup
10: ldc           #2          // String abc
12: invokespecial #4          // Method java/lang/String."<init>":(Ljava/lang/String;)V
15: astore_3
16: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
19: aload_1
20: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
23: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
26: aload_2
27: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
30: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
33: aload_1
34: aload_3
35: if_acmpne     42
38: iconst_1
39: goto          43
42: iconst_0
43: invokevirtual #7          // Method java/io/PrintStream.println:(Z)V

Note how:注意方法:

  • 0 and 3 : the same ldc #2 constant is loaded (the literals) 03 :加载相同的ldc #2常量(文字)
  • 12 : a new string instance is created (with #2 as argument) 12 :创建一个新的字符串实例(以#2作为参数)
  • 35 : a and c are compared as regular objects with if_acmpne 35 : ac作为常规对象与if_acmpne进行比较

The representation of constant strings is quite magic on the bytecode:常量字符串的表示在字节码上非常神奇:

and the JVMS quote above seems to say that whenever the Utf8 pointed to is the same, then identical instances are loaded by ldc .上面的 JVMS 引用似乎是说,只要指向的 Utf8 相同,那么ldc加载相同的实例。

I have done similar tests for fields, and:我对字段进行了类似的测试,并且:

  • static final String s = "abc" points to the constant table through the ConstantValue Attribute static final String s = "abc"通过ConstantValue 属性指向常量表
  • non-final fields don't have that attribute, but can still be initialized with ldc非最终字段没有该属性,但仍可以使用ldc进行初始化

Conclusion : there is direct bytecode support for the string pool, and the memory representation is efficient.结论:字符串池有直接字节码支持,内存表示高效。

Bonus: compare that to the Integer pool , which does not have direct bytecode support (ie no CONSTANT_String_info analogue).奖励:将其与Integer pool进行比较,后者没有直接的字节码支持(即没有CONSTANT_String_info类似物)。

Update for Java 8 or plus .针对 Java 8 或 plus 的更新 In Java 8, PermGen (Permanent Generation) space is removed and replaced by Meta Space.在 Java 8 中,PermGen(永久代)空间被移除并被元空间取代。 The String pool memory is moved to the heap of JVM. String 池内存被移动到 JVM 的堆中。

Compared with Java 7, the String pool size is increased in the heap.与 Java 7 相比,在堆中增加了 String 池大小。 Therefore, you have more space for internalized Strings, but you have less memory for the whole application.因此,内部化字符串有更多空间,但整个应用程序的内存较少。

One more thing, you have already known that when comparing 2 (referrences of) objects in Java, ' == ' is used for comparing the reference of object, ' equals ' is used for comparing the contents of object.还有一件事,你已经知道在Java中比较2个(引用)对象时,' == '用于比较对象的引用,' equals '用于比较对象的内容。

Let's check this code:让我们检查一下这段代码:

String value1 = "70";
String value2 = "70";
String value3 = new Integer(70).toString();

Result:结果:

value1 == value2 ---> true value1 == value2 ---> 真

value1 == value3 ---> false value1 == value3 ---> 假

value1.equals(value3) ---> true value1.equals(value3) ---> true

value1 == value3.intern() ---> true value1 == value3.intern() ---> 真

That's why you should use ' equals ' to compare 2 String objects.这就是为什么您应该使用“ equals ”来比较 2 个 String 对象的原因。 And that's is how intern() is useful.这就是intern()的用处。

Since strings are objects and since all objects in Java are always stored only in the heap space, all strings are stored in the heap space.由于字符串是对象,而且 Java 中的所有对象始终只存储在堆空间中,因此所有字符串都存储在堆空间中。 However, Java keeps strings created without using the new keyword in a special area of the heap space, which is called "string pool".但是,Java 将在未使用 new 关键字的情况下创建的字符串保存在堆空间的一个特殊区域中,该区域称为“字符串池”。 Java keeps the strings created using the new keyword in the regular heap space. Java 将使用 new 关键字创建的字符串保存在常规堆空间中。

The purpose of the string pool is to maintain a set of unique strings.字符串池的目的是维护一组唯一的字符串。 Any time you create a new string without using the new keyword, Java checks whether the same string already exists in the string pool.任何时候不使用 new 关键字创建新字符串时,Java 都会检查字符串池中是否已存在相同的字符串。 If it does, Java returns a reference to the same String object and if it does not, Java creates a new String object in the string pool and returns its reference.如果是,Java 返回对同一个 String 对象的引用,如果不是,Java 在字符串池中创建一个新的 String 对象并返回其引用。 So, for example, if you use the string "hello" twice in your code as shown below, you will get a reference to the same string.因此,例如,如果您在代码中使用字符串“hello”两次,如下所示,您将获得对同一字符串的引用。 We can actually test this theory out by comparing two different reference variables using the == operator as shown in the following code:我们可以通过使用==运算符比较两个不同的引用变量来实际测试这个理论,如下面的代码所示:

String str1 = "hello";
String str2 = "hello";
System.out.println(str1 == str2); //prints true

String str3 = new String("hello");
String str4 = new String("hello");

System.out.println(str1 == str3); //prints false
System.out.println(str3 == str4); //prints false 

== operator is simply checks whether two references point to the same object or not and returns true if they do. ==运算符只是检查两个引用是否指向同一个对象,如果指向则返回 true。 In the above code, str2 gets the reference to the same String object which was created earlier.在上面的代码中, str2获取对之前创建的同一个 String 对象的引用。 However, str3 and str4 get references to two entirely different String objects.但是, str3str4获得对两个完全不同的 String 对象的引用。 That is why str1 == str2 returns true but str1 == str3 and str3 == str4 return false .这就是为什么str1 == str2返回 true 但str1 == str3str3 == str4返回 false 的原因。 In fact, when you do new String("hello");事实上,当你做new String("hello"); two String objects are created instead of just one if this is the first time the string "hello" is used in the anywhere in program - one in the string pool because of the use of a quoted string, and one in the regular heap space because of the use of new keyword.如果这是第一次在程序的任何地方使用字符串“hello”,则会创建两个 String 对象,而不是一个,一个在字符串池中,因为使用了带引号的字符串,一个在常规堆空间中,因为new 关键字的使用。

String pooling is Java's way of saving program memory by avoiding the creation of multiple String objects containing the same value.字符串池是 Java 通过避免创建多个包含相同值的 String 对象来节省程序内存的方式。 It is possible to get a string from the string pool for a string created using the new keyword by using String's intern method.可以通过使用 String 的 intern 方法从使用 new 关键字创建的字符串的字符串池中获取字符串。 It is called "interning" of string objects.它被称为字符串对象的“实习”。 For example,例如,

String str1 = "hello";
String str2 = new String("hello");
String str3 = str2.intern(); //get an interned string obj

System.out.println(str1 == str2); //prints false
System.out.println(str1 == str3); //prints true

OCP Java SE 11 Programmer, Deshmukh OCP Java SE 11 程序员,Deshmukh

String interning is an optimization technique by the compiler.字符串实习是编译器的一种优化技术。 If you have two identical string literals in one compilation unit then the code generated ensures that there is only one string object created for all the instance of that literal(characters enclosed in double quotes) within the assembly.如果在一个编译单元中有两个相同的字符串文字,那么生成的代码将确保只为程序集中该文字的所有实例(用双引号括起来的字符)创建一个字符串对象。

I am from C# background, so i can explain by giving a example from that:我来自 C# 背景,所以我可以通过举一个例子来解释:

object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;

output of the following comparisons:以下比较的输出:

Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true    
Console.WriteLine(obj == str2); // false !?

Note1 :Objects are compared by reference.注1 :对象通过引用进行比较。

Note2 :typeof(int).Name is evaluated by reflection method so it does not gets evaluated at compile time. Note2 :typeof(int).Name 是通过反射方法计算的,所以它不会在编译时计算。 Here these comparisons are made at compile time.这里的这些比较是在编译时进行的。

Analysis of the Results: 1) true because they both contain same literal and so the code generated will have only one object referencing "Int32".结果分析: 1) 正确,因为它们都包含相同的文字,因此生成的代码将只有一个对象引用“Int32”。 See Note 1 .见注 1

2) true because the content of both the value is checked which is same. 2) true 因为检查了两个值的内容是相同的。

3) FALSE because str2 and obj does not have the same literal. 3) FALSE 因为 str2 和 obj 没有相同的文字。 See Note 2 .注 2

Java interning() method basically makes sure that if String object is present in SCP, If yes then it returns that object and if not then creates that objects in SCP and return its references

for eg: String s1=new String("abc");
        String s2="abc";
        String s3="abc";

s1==s2// false, because 1 object of s1 is stored in heap and other in scp(but this objects doesn't have explicit reference) and s2 in scp
s2==s3// true

now if we do intern on s1
s1=s1.intern() 

//JVM checks if there is any string in the pool with value “abc” is present? Since there is a string object in the pool with value “abc”, its reference is returned.
Notice that we are calling s1 = s1.intern(), so the s1 is now referring to the string pool object having value “abc”.
At this point, all the three string objects are referring to the same object in the string pool. Hence s1==s2 is returning true now.

By using heap object reference, if we want to corresponding SCP object reference we should go for intern() method.通过使用堆对象引用,如果我们想要对应的 SCP 对象引用,我们应该使用 intern() 方法。

Example :示例

class InternDemo
{
public static void main(String[] args)
{
String s1=new String("smith");
String s2=s1.intern();
String s3="smith";
System.out.println(s2==s3);//true
}
}

intern flow chart实习生流程图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM