简体   繁体   English

为什么我们有字符串池时的字符串重复数据删除

[英]Why String Deduplication when we have String Pool

String De-duplication : 字符串重复数据删除

Strings consume a lot of memory in any application.Whenever the garbage collector visits String objects it takes note of the char arrays. 字符串在任何应用程序中都占用大量内存。每当垃圾收集器访问String对象时,它都会记录char数组。 It takes their hash value and stores it alongside with a weak reference to the array. 它接受它们的哈希值并将其与对数组的弱引用一起存储。 As soon as it finds another String which has the same hash code it compares them char by char.If they match as well, one String will be modified and point to the char array of the second String. 一旦找到另一个具有相同哈希码的String,就会将它们与char进行比较。如果它们匹配,则会修改一个String并指向第二个String的char数组。 The first char array then is no longer referenced anymore and can be garbage collected. 然后不再引用第一个char数组,并且可以进行垃圾回收。

String Pool: 字符串池:

All strings used by the java program are stored here. java程序使用的所有字符串都存储在此处。 If two variables are initialized to the same string value. 如果两个变量初始化为相同的字符串值。 Two strings are not created in the memory, there will be only one copy stored in memory and both will point to the same memory location. 内存中没有创建两个字符串,只有一个副本存储在内存中,两个副本都指向相同的内存位置。

So java already takes care of not creating duplicate strings in the heap by checking if the string exists in the string pool. 因此,java已经通过检查字符串池中是否存在字符串来解决不在堆中创建重复字符串的问题。 Then what is the purpose of string de-duplication? 那么字符串重复数据删除的目的是什么?

If there is a code as follows 如果有如下代码

    String myString_1 = new String("Hello World");
    String myString_2 = new String("Hello World");

two strings are created in memory even though they are same. 即使它们是相同的,也会在内存中创建两个字符串。 I cannot think of any scenario other than this where string de-duplication is useful. 除了这个字符串重复数据删除很有用之外,我想不出任何其他情况。 Obviously I must be missing something. 显然我必须遗漏一些东西。 What I am I missing? 我错过了什么?

Thanks In Advance 提前致谢

The string pool applies only to strings added to it explicitly, or used as constants in the application. 串池适用于加入到它明确,或用作应用常量字符串。 It does not apply to strings created dynamically during the lifetime of the application. 并不适用于应用程序的生命周期中动态创建的字符串。 String deduplication, however, applies to all strings. 但是,字符串重复数据删除适用于所有字符串。

String de-duplication enjoys the extra level of indirection built into String : 字符串重复数据删除享有String内置的额外间接级别:

  • With a string pool, you are limited to returning the same object for two identical strings 使用字符串池,您只能为两个相同的字符串返回相同的对象
  • String de-duplication lets you have multiple distinct String objects sharing the same content . 字符串重复数据删除允许您拥有共享相同内容的多个不同的String对象。

This translates into removing a limitation of de-duplicating on creation: your application could keep creating new String objects with identical content while using very little extra memory, because the content of the strings would be shared. 这转化为消除了对创建的重复数据删除的限制:您的应用程序可以继续创建具有相同内容的新String对象,同时使用非常少的额外内存,因为字符串的内容将被共享。 This process can be done on a completely unrelated schedule - for example, in the background, while your application does not need much of the CPU resources. 此过程可以在完全不相关的计划上完成 - 例如,在后台,而您的应用程序不需要太多的CPU资源。 Since the identity of the String object does not change, de-duplication can be completely hidden form your application. 由于String对象的标识不会更改,因此可以从应用程序中完全隐藏重复数据删除。

Compile time vs run time 编译时间与运行时间

String pool refers to string constants that are known at compile time. 字符串池是指在编译时已知的字符串常量

String deduplication would help you if you happen to retrieve (or construct) the same string a million times at run time, eg reading it from a file, a HTTP request or any other way. 如果您碰巧在运行时检索(或构造)相同的字符串一百万次,例如从文件,HTTP请求或任何其他方式读取它,字符串重复数据删除将帮助您。

Just to add to the answers above, on older VM's the string pool is not garbage collected (this has changed now, but don't rely on that). 只是为了添加上面的答案,在较旧的VM上,字符串池不是垃圾收集的(现在已经改变了,但不依赖于它)。 It contains strings which are used as constants in the application, and so will always be needed. 它包含在应用程序中用作常量的字符串,因此总是需要它。 If you continually put all your strings in the string pool, you might quickly run out of memory. 如果您不断将所有字符串放在字符串池中,则可能会很快耗尽内存。 On top of that, de-duplication is a relatively expensive process, if you know you only need the string for a very short period of time, and you have enough memory. 最重要的是,重复数据删除是一个相对昂贵的过程,如果你知道你只需要很长一段时间的字符串,并且你有足够的内存。

For these reasons, strings are not put in the string pool automatically. 由于这些原因,字符串不会自动放入字符串池中。 You have to do it explicitly by calling string.intern() . 您必须通过调用string.intern()显式地执行此操作。

I cannot think of any scenario other than this where string de-duplication is useful. 除了这个字符串重复数据删除很有用之外,我想不出任何其他情况。

Well one other (much more) frequent scenario is the use of StringBuilder s. 另外一个(更多)常见的场景是使用StringBuilder In the toString() method of the StringBuilder class, it clearly creates a new instance in memory: StringBuilder类的toString()方法中,它清楚地在内存中创建了一个新实例:

public final class StringBuilder extends AbstractStringBuilder
                                 implements java.io.Serializable, CharSequence
{
    ...

    @Override
    public String toString() {
       // Create a copy, don't share the array
       return new String(value, 0, count);
    }

    ...

}

Same thing for its thread-safe version StringBuffer : 它的线程安全版本StringBuffer

public final class StringBuffer extends AbstractStringBuilder
                                implements java.io.Serializable, CharSequence
{
   ...

   @Override
   public synchronized String toString() {
       if (toStringCache == null) {
           toStringCache = Arrays.copyOfRange(value, 0, count);
       }
       return new String(toStringCache, true);
   }

   ...
}

In applications that rely heavily on this, string de-duplication may reduce memory usage. 在严重依赖于此的应用程序中,字符串重复数据删除可能会减少内存使用量。

From documentation : 来自文档:

"Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable." “初始化一个新创建的String对象,使其表示与参数相同的字符序列;换句话说,新创建的字符串是参数字符串的副本。除非需要原始的显式副本,否则使用此构造函数因为字符串是不可变的,所以不必要。“

So my sense says, this constructor in String class is not needed normally like you have used above. 所以我的感觉说,String类中的这个构造函数通常不像你上面使用的那样需要。 I guess that constructor is provided merely for the sake of completeness or if you do not want to share that copy (kind of unnecessary now, refer here what I am talking about) but still other constructors are useful like getting an String object from char array and so on.. 我想这个构造函数仅仅是为了完整性而提供的,或者如果你不想共享那个副本(现在有点不必要,请参考这里我要讨论的内容)但是其他构造函数也很有用,比如从char数组中获取一个String对象等等..

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM