简体繁体 English

性能 ConcurrentHashmap 与 HashMap

[英]Performance ConcurrentHashmap vs HashMap

原文 2009-09-04 10:00:19 7 8 java/ collections/ hashmap

How is the performance of ConcurrentHashMap compared to HashMap, especially.get() operation (I'm especially interested for the case of only few items, in the range between maybe 0-5000)?与 HashMap 相比，ConcurrentHashMap 的性能如何，尤其是 .get() 操作（我对只有少数项目的情况特别感兴趣，范围可能在 0-5000 之间）？

Is there any reason not to use ConcurrentHashMap instead of HashMap?有什么理由不使用 ConcurrentHashMap 而不是 HashMap 吗？

(I know that null values aren't allowed) （我知道不允许使用空值）

Update更新

just to clarify, obviously the performance in case of actual concurrent access will suffer, but how compares the performance in case of no concurrent access?只是澄清一下，显然在实际并发访问的情况下性能会受到影响，但是如何比较没有并发访问的情况下的性能？

8 个解决方案

I was really surprised to find this topic to be so old and yet no one has yet provided any tests regarding the case.我真的很惊讶这个话题这么老，但还没有人提供任何关于这个案例的测试。 Using ScalaMeter I have created tests of add , get and remove for both HashMap and ConcurrentHashMap in two scenarios:使用ScalaMeter ，我在两种情况下为HashMap和ConcurrentHashMap创建了add 、 get和remove的测试：

using single thread使用单线程
using as many threads as I have cores available.使用尽可能多的线程，因为我有可用的内核。 Note that because HashMap is not thread-safe, I simply created separate HashMap for each thread, but used one, shared ConcurrentHashMap .请注意，因为HashMap不是线程安全的，所以我只是为每个线程创建了单独的HashMap ，但使用了一个共享的ConcurrentHashMap 。

Code is available on my repo .代码在我的 repo 上可用。

*The results are as follows:**结果如下：*

X axis (size) presents number of elements written to the map(s) X 轴（大小）表示写入地图的元素数量
Y axis (value) presents time in milliseconds Y轴（值）以毫秒为单位表示时间

The summary摘要

If you want to operate on your data as fast as possible, use all the threads available.如果您想尽快对数据进行操作，请使用所有可用的线程。 That seems obvious, each thread has 1/nth of the full work to do.这似乎很明显，每个线程都有 1/n 的全部工作要做。
If you choose a single thread access use HashMap , it is simply faster.如果您选择单线程访问使用HashMap ，它会更快。 For add method it is even as much as 3x more efficient.对于add方法，它的效率甚至提高了 3 倍。 Only get is faster on ConcurrentHashMap , but not much.只有get在ConcurrentHashMap上更快，但并不多。
When operating on ConcurrentHashMap with many threads it is similarly effective to operating on separate HashMaps for each thread.在具有多个线程的ConcurrentHashMap上操作时，它与为每个线程在单独的HashMaps上操作同样有效。 So there is no need to partition your data in different structures.因此无需将数据划分为不同的结构。

To sum up, the performance for ConcurrentHashMap is worse when you use with single thread, but adding more threads to do the work will definitely speed-up the process.综上所述，单线程时ConcurrentHashMap的性能较差，但添加更多线程来完成工作肯定会加快进程。

Testing platform测试平台

AMD FX6100, 16GB Ram AMD FX6100、16GB 内存
Xubuntu 16.04, Oracle JDK 8 update 91, Scala 2.11.8 Xubuntu 16.04、Oracle JDK 8 更新 91、Scala 2.11.8

Thread safety is a complex question.线程安全是一个复杂的问题。 If you want to make an object thread safe, do it consciously, and document that choice.如果你想让一个对象线程安全，有意识地去做，并记录那个选择。 People who use your class will thank you if it is thread safe when it simplifies their usage, but they will curse you if an object that once was thread safe becomes not so in a future version.使用你的类的人会感谢你，如果它是线程安全的，因为它简化了他们的使用，但如果一个曾经是线程安全的对象在未来的版本中变得不安全，他们会诅咒你。 Thread safety, while really nice, is not just for Christmas!线程安全虽然非常好，但不仅仅适用于圣诞节！

So now to your question:所以现在回答你的问题：

ConcurrentHashMap (at least in Sun's current implementation ) works by dividing the underlying map into a number of separate buckets. ConcurrentHashMap（至少在Sun 当前的实现中）通过将底层映射划分为多个单独的桶来工作。 Getting an element does not require any locking per se, but it does use atomic/volatile operations, which implies a memory barrier (potentially very costly, and interfering with other possible optimisations).获取一个元素本身不需要任何锁定，但它确实使用原子/易失性操作，这意味着内存屏障（可能非常昂贵，并且会干扰其他可能的优化）。

Even if all the overhead of atomic operations can be eliminated by the JIT compiler in a single-threaded case, there is still the overhead of deciding which of the buckets to look in - admittedly this is a relatively quick calculation, but nevertheless, it is impossible to eliminate.即使在单线程情况下，JIT 编译器可以消除原子操作的所有开销，仍然存在决定查找哪个桶的开销 - 诚然，这是一个相对快速的计算，但无论如何，它是无法消除。

As for deciding which implementation to use, the choice is probably simple.至于决定使用哪个实现，选择可能很简单。

If this is a static field, you almost certainly want to use ConcurrentHashMap, unless testing shows this is a real performance killer.如果这是一个静态字段，您几乎可以肯定要使用 ConcurrentHashMap，除非测试表明这是一个真正的性能杀手。 Your class has different thread safety expectations from the instances of that class.您的类对该类的实例具有不同的线程安全期望。

If this is a local variable, then chances are a HashMap is sufficient - unless you know that references to the object can leak out to another thread.如果这是一个局部变量，那么 HashMap 很可能就足够了——除非您知道对该对象的引用可能会泄漏到另一个线程。 By coding to the Map interface, you allow yourself to change it easily later if you discover a problem.通过对 Map 接口进行编码，您可以在以后发现问题时轻松更改它。

If this is an instance field, and the class hasn't been designed to be thread safe, then document it as not thread safe, and use a HashMap.如果这是一个实例字段，并且该类未设计为线程安全的，则将其记录为非线程安全的，并使用 HashMap。

If you know that this instance field is the only reason the class isn't thread safe, and are willing to live with the restrictions that promising thread safety implies, then use ConcurrentHashMap, unless testing shows significant performance implications.如果您知道此实例字段是该类不是线程安全的唯一原因，并且愿意忍受承诺线程安全所暗示的限制，那么请使用 ConcurrentHashMap，除非测试显示出显着的性能影响。 In that case, you might consider allowing a user of the class to choose a thread safe version of the object somehow, perhaps by using a different factory method.在这种情况下，您可能会考虑允许类的用户以某种方式选择对象的线程安全版本，可能是通过使用不同的工厂方法。

In either case, document the class as being thread safe (or conditionally thread safe) so people who use your class know they can use objects across multiple threads, and people who edit your class know that they must maintain thread safety in future.在任何一种情况下，将该类记录为线程安全（或有条件的线程安全），以便使用您的类的人知道他们可以跨多个线程使用对象，并且编辑您的类的人知道他们将来必须维护线程安全。

I would recommend you measure it, since (for one reason) there may be some dependence on the hashing distribution of the particular objects you're storing.我建议您对其进行测量，因为（出于一个原因）可能对您存储的特定对象的散列分布有一定的依赖性。

The standard hashmap provides no concurrency protection whereas the concurrent hashmap does.标准哈希图不提供并发保护，而并发哈希图提供。 Before it was available, you could wrap the hashmap to get thread safe access but this was coarse grain locking and meant all concurrent access got serialised which could really impact performance.在它可用之前，您可以包装 hashmap 以获得线程安全访问，但这是粗粒度锁定，意味着所有并发访问都被序列化，这确实会影响性能。

The concurrent hashmap uses lock stripping and only locks items that affected by a particular lock.并发散列图使用锁剥离并且只锁定受特定锁影响的项目。 If you're running on a modern vm such as hotspot, the vm will try and use lock biasing, coarsaning and ellision if possible so you'll only pay the penalty for the locks when you actually need it.如果您在现代虚拟机（如热点）上运行，虚拟机将尽可能尝试使用锁偏置、粗化和省略，因此您只需在实际需要时为锁支付罚款。

In summary, if your map is going to be accesaed by concurrent threads and you need to guarantee a consistent view of it's state, use the concurrent hashmap.总之，如果您的地图将被并发线程访问并且您需要保证其状态的一致视图，请使用并发哈希图。

In the case of a 1000 element hash table using 10 locks for whole table saves close to half the time when 10000 threads are inserting and 10000 threads are deleting from it.在 1000 个元素的哈希表的情况下，对整个表使用 10 个锁可以节省将近一半的时间，当 10000 个线程插入和 10000 个线程从中删除时。

The interesting run time difference is here有趣的运行时差异在这里

Always use Concurrent data structure.始终使用并发数据结构。 except when the downside of striping (mentioned below) becomes a frequent operation.除非条带化的缺点（下面提到）成为一个频繁的操作。 In that case you will have to acquire all the locks?在那种情况下，您将必须获得所有锁？ I read that the best ways to do this is by recursion.我读到最好的方法是递归。

Lock striping is useful when there is a way of breaking a high contention lock into multiple locks without compromising data integrity.当有一种方法可以在不损害数据完整性的情况下将高争用锁分解为多个锁时，锁条带化很有用。 If this is possible or not should take some thought and is not always the case.如果这可能或不可能，应该考虑一下，但情况并非总是如此。 The data structure is also the contributing factor to the decision.数据结构也是影响决策的因素。 So if we use a large array for implementing a hash table, using a single lock for the entire hash table for synchronizing it will lead to threads sequentially accessing the data structure.所以如果我们使用一个大数组来实现一个哈希表，对整个哈希表使用一个锁来同步它会导致线程顺序访问数据结构。 If this is the same location on the hash table then it is necessary but, what if they are accessing the two extremes of the table.如果这是哈希表上的相同位置，那么这是必要的，但是，如果他们正在访问表的两个极端怎么办。

The down side of lock striping is it is difficult to get the state of the data structure that is affected by striping.锁条带化的缺点是很难获得受条带化影响的数据结构的状态。 In the example the size of the table, or trying to list/enumerate the whole table may be cumbersome since we need to acquire all of the striped locks.在示例中，表的大小或尝试列出/枚举整个表可能很麻烦，因为我们需要获取所有条带锁。

What answer are you expecting here?你在这里期待什么答案？

It is obviously going to depend on the number of reads happening at the same time as writes and how long a normal map must be "locked" on a write operation in your app (and whether you would make use of the putIfAbsent method on ConcurrentMap ).这显然取决于与写入同时发生的读取次数以及法线贴图必须在您的应用程序中的写入操作中“锁定”多长时间（以及您是否会在ConcurrentMap上使用putIfAbsent方法） . Any benchmark is going to be largely meaningless.任何基准在很大程度上都将毫无意义。

It's not clear what your mean.不清楚你的意思。 If you need thread safeness, you have almost no choice - only ConcurrentHashMap.如果你需要线程安全，你几乎别无选择——只有 ConcurrentHashMap。 And it's definitely have performance/memory penalties in get() call - access to volatile variables and lock if you're unlucky.而且它在 get() 调用中肯定会有性能/内存损失 - 如果你不走运，访问 volatile 变量和锁定。

Of course a Map without any lock system wins against one with thread-safe behavior which needs more work.当然，没有任何锁定系统的 Map 胜过具有需要更多工作的线程安全行为的 Map。 The point of the Concurrent one is to be thread safe without using synchronized so to be faster than HashTable. Concurrent 的要点是在不使用同步的情况下是线程安全的，因此比 HashTable 更快。 Same graphics would would be very interesting for ConcurrentHashMap vs Hashtable (which is synchronized).对于 ConcurrentHashMap 与 Hashtable（同步），相同的图形将非常有趣。