Java String.intern（）使用HashTable而不是ConcurrentHashMap

Question

I am research String.intern() and this method have a performance penalty. 我研究String.intern（），这种方法有性能损失。 I've compared String.intern() with ConcurrentHashMap.putIfAbsent(s,s) with Microbenchmark. 我将String.intern（）与ConcurrentHashMap.putIfAbsent（s，s）与Microbenchmark进行了比较。 Used Java1.8.0_212, Ubuntu 18.04.2 LTS 使用Java1.8.0_212，Ubuntu 18.04.2 LTS

@Param({"1", "100", "10000", "1000000"})
private int size;

private StringIntern stringIntern;
private ConcurrentHashMapIntern concurrentHashMapIntern;

@Setup
public void setup(){
    stringIntern = new StringIntern();
    concurrentHashMapIntern = new ConcurrentHashMapIntern();
}
public static class StringIntern{
    public String intern(String s){
        return s.intern();
    }
}
public static class ConcurrentHashMapIntern{
    private final Map<String, String> map;

    public ConcurrentHashMapIntern(){
        map= new ConcurrentHashMap<>();
    }
    public String intern(String s){
        String existString = map.putIfAbsent(s, s);
        return (existString == null) ? s : existString;
    }
}

@Benchmark
public void intern(Blackhole blackhole){
    for(int count =0; count<size; count ++){
        blackhole.consume(stringIntern.intern("Example "+count));
    }
}
@Benchmark
public void concurrentHashMapIntern(Blackhole blackhole){
    for(int count =0; count<size; count++){
        blackhole.consume(concurrentHashMapIntern.intern("Example " +count));
    }
}

Result as expected. 结果如预期。 ConcurrentHashMap faster than String.intern() when search string. 当搜索字符串时，ConcurrentHashMap比String.intern（）更快。

Benchmark                             (size)  Mode  Cnt        Score        Error  Units
MyBenchmark.concurrentHashMapIntern        1  avgt    5        0.056 ±      0.007  us/op
MyBenchmark.concurrentHashMapIntern      100  avgt    5        6.094 ±      2.359  us/op
MyBenchmark.concurrentHashMapIntern    10000  avgt    5      787.802 ±    264.179  us/op
MyBenchmark.concurrentHashMapIntern  1000000  avgt    5   136504.010 ±  17872.866  us/op
MyBenchmark.intern                         1  avgt    5        0.129 ±      0.007  us/op
MyBenchmark.intern                       100  avgt    5       13.700 ±      2.404  us/op
MyBenchmark.intern                     10000  avgt    5     1618.514 ±    460.563  us/op
MyBenchmark.intern                   1000000  avgt    5  1027915.854 ± 638910.023  us/op

String.intern() slower than ConcurrentHashMap because String.intern() is native HashTable implementation. String.intern（）比ConcurrentHashMap慢，因为String.intern（）是本机HashTable实现。 And then, read javadoc about HashTable, this documantation says: 然后，阅读关于HashTable的javadoc ，这个文档说：

If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. 如果不需要线程安全实现，建议使用HashMap代替Hashtable。 If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable. 如果需要线程安全的高度并发实现，那么建议使用ConcurrentHashMap代替Hashtable。

This is very confusing situation. 这是非常令人困惑的情况。 It recommend ConcurrentHashMap, but it using HashTable although performance penalty. 它推荐使用ConcurrentHashMap，但它使用HashTable虽然性能下降。 Does anyone have any idea about why used native HashTable implemantation instance of ConcurrentHashMap ? 有没有人知道为什么使用ConcurrentHashMap的本机HashTable实现实例？

Answer 1

There are a number of things going on here: 这里有很多事情要做：

Your benchmarks have very large error bars. 您的基准测试具有非常大的误差线。 The repeat counts are probably too small. 重复计数可能太小了。 This makes the results questionable . 这使得结果有问题。
It doesn't look like your benchmarks are resetting the "interned string" caches after each run ¹ . 看起来你的基准测试不会在每次运行¹之后重置“interned string”缓存。 So that means that the caches are growing, and each repetition will be starting with different conditions. 这意味着缓存正在增长，每次重复都将以不同的条件开始。 This may explain the error bars ... 这可以解释错误栏...
Your ConcurrentHashMap is not functionally equivalent to String::intern . 您的ConcurrentHashMap在功能上与String::intern 。 The latter uses a native equivalent to Reference objects to ensure that interned strings can be garbage collected. 后者使用与Reference对象相当的本机，以确保可以对已中断的字符串进行垃圾回收。 Your ConcurrentHashMap implementation doesn't. 你的ConcurrentHashMap实现没有。 Why does this matter? 为什么这很重要？
- Your ConcurrentHashMap is a massive memory leak. 你的ConcurrentHashMap是一个巨大的内存泄漏。
- A reference mechanism is expensive ... at GC time. 在GC时间，参考机制很昂贵。

String.intern() slower than ConcurrentHashMap because String.intern() is native HashTable implementation. String.intern（）比ConcurrentHashMap慢，因为String.intern（）是本机HashTable实现。

No. The real reason is that the native implementation is doing things differently: 不是。真正的原因是本机实现的方式不同：

There may a JNI call overhead when you call String::intern . 调用String::intern时可能会有JNI调用开销。
The internal representations are different. 内部表示是不同的。
It has to handle references which impacts on GC performance. 它必须处理影响GC性能的参考。
There are also behind-the-scenes interactions with string deduping and other things. 还有与字符串重复删除和其他事情的幕后交互。

Note that these things vary considerably across different Java versions. 请注意，不同Java版本的这些内容差异很大。

This is very confusing situation. 这是非常令人困惑的情况。 It recommend ConcurrentHashMap, but it using HashTable although performance penalty. 它推荐使用ConcurrentHashMap，但它使用HashTable虽然性能下降。

Now you are talking about a different scenario, that is not relevant to what you are doing. 现在你谈论的是一个与你正在做的事情无关的不同场景。

Note that String::intern doesn't use either HashTable or HashMap ; 请注意， String::intern不使用HashTable或HashMap ; see above. 往上看。
The quote that you found is about how to get good concurrent performance from a hash table. 您找到的引用是关于如何从哈希表中获得良好的并发性能。 Your benchmark is (AFAIK) single threaded. 您的基准是（AFAIK）单线程。 For a serial use-use case, HashMap will give better performance than the others. 对于串行用例， HashMap将提供比其他用户更好的性能。

Does anyone have any idea about why used native HashTable implemantation instance of ConcurrentHashMap ? 有没有人知道为什么使用ConcurrentHashMap的本机HashTable实现实例？

It doesn't use a hash table; 它不使用哈希表; see above. 往上看。 There are a number of reason that it doesn't HashTable or HashMap or ConcurrentHashMap : 有很多原因它不是HashTable或HashMap或ConcurrentHashMap ：

It is that it is paying more attention to memory utilization. 它正在更加关注内存利用率。 All of the Java hash table implementations are memory hungry and that makes them unsuitable for general purpose string interning. 所有Java哈希表实现都是内存饥饿 ，这使得它们不适合通用字符串实习。
The memory and CPU overheads of using Reference classes are significant. 使用Reference类的内存和CPU开销很重要。
Computing a hash of a newly created string of length N is O(N) which will be significant when interning strings that may be hundreds / thousands of characters long. 计算新创建的长度为N的字符串的散列是O（N），当实际上可能是数百/数千个字符长的字符串时，这将是重要的。

Finally, be carefully that you are not focusing on the wrong problem here. 最后，要小心，你没有关注这里的错误问题。 If you are trying to optimize interning because it is a bottleneck in your application, the other strategy is to not intern at all. 如果您正在尝试优化实习，因为它是您的应用程序的瓶颈，另一个策略是根本不实习。 In practice, it rarely saved memory (especially compared with G1GC's string de-duping) and rarely improves string handling performance. 实际上，它很少保存内存（特别是与G1GC的字符串重复数据删除相比）并且很少提高字符串处理性能。

In summary: 综上所述：

You are comparing apples and oranges. 你正在比较苹果和橘子。 Your map-base implementation is not equivalent to native interning. 您的基于地图的实现不等同于本机实习。
String::intern is not optimized solely (even primarily) for speed. String::intern并未针对速度进行单独（甚至主要）优化。
By focusing on speed, you are ignoring memory utilization ... and the secondary effect of memory utilization on speed. 通过关注速度，您忽略了内存利用率......以及内存利用率对速度的次要影响。
Consider the potential optimization of not interning at all. 考虑一下根本不实习的潜在优化。

^{1 - And in the native intern case, I don't think that is possible.} ^{1 - 在本土intern案件中，我认为这是不可能的。}

Java String.intern（）使用HashTable而不是ConcurrentHashMap

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-05-19 00:28:48

Java String.intern（）使用HashTable而不是ConcurrentHashMap

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-05-19 00:28:48

解决方案1
3 已采纳 2019-05-19 00:28:48