简体繁体 English

为什么 ConcurrentHashMap 不能为每个桶都加锁？

[英]Why ConcurrentHashMap cannot have a lock for each bucket?

原文 2014-08-20 17:35:37 5 3 java/ multithreading/ concurrency/ concurrenthashmap

As we know, java's ConcurrentHashMap has a number of internal locks, and each of them guards some region of the bucket's array.我们知道，java 的 ConcurrentHashMap 有许多内部锁，每个锁都保护着存储桶数组的某个区域。

A question is: why cannot we create a lock for each bucket ?一个问题是：为什么我们不能为每个桶创建一个锁？

A similair question was already asked: Disadvantage of increasing number of partition in Java ConcurrentHashMap?已经问过一个类似的问题： Java ConcurrentHashMap 中增加分区数量的缺点？

According to the answer, there are several reasons:根据回答，有以下几个原因：

the maximum number of threads running simultaneously is limited by the number of cores of the processor.同时运行的最大线程数受处理器内核数的限制。 Is this correct?这样对吗？ Can we ALWAYS state that if we have 8-core processor we do not need more than 8 locked regions in ConcurrentHashMap?我们是否可以始终声明，如果我们有 8 核处理器，我们在 ConcurrentHashMap 中不需要超过 8 个锁定区域？
There is a waste of L2 cache.浪费了 L2 缓存。 Why?为什么？
There is a waste of memory.存在内存浪费。 Looks like this is because of additional lock creation.看起来这是因为额外的锁创建。

Is there any more reasons?还有其他原因吗？

3 个解决方案

Hopefully I do a decent job of explaining... kind of rushed at the moment...希望我能很好地解释……此刻有点匆忙……

The answer to your first question:你的第一个问题的答案：

"why cannot we create a lock for each bucket?" “为什么我们不能为每个存储桶创建一个锁？”

Is that you can create a lock for each bucket - it just isn't necessarily the best course of action.是您可以为每个存储桶创建一个锁——这不一定是最好的行动方案。

The answer to your question:你的问题的答案：

"Can we ALWAYS state that if we have 8-core processor we do not need more than 8 locked regions in ConcurrentHashMap" “我们能否始终声明，如果我们有 8 核处理器，我们不需要在 ConcurrentHashMap 中超过 8 个锁定区域”

is technically "No", though it depends on what you mean by "need".从技术上讲是“否”，尽管这取决于您所说的“需要”是什么意思。 Having a number of regions that matches your system's maximum concurrency or is slightly greater does not necessarily prevent contention, but in practice it works pretty well.拥有多个与您的系统的最大并发数相匹配或稍大的区域不一定能防止争用，但在实践中它工作得很好。 There's nothing stopping two threads from attempting to access the same region at the same time, even if there are other regions that aren't locked.没有什么可以阻止两个线程同时尝试访问同一个区域，即使还有其他区域没有被锁定。

What you can guarantee by having 8 regions or more on an 8-core processor is that all regions can be accessed simultaneously without contention.通过在 8 核处理器上拥有 8 个或更多区域，您可以保证可以同时访问所有区域而不会发生争用。 If you have 8 cores (not Hyper Threaded) you can perform at most 8 operations at the same time.如果您有 8 个内核（不是超线程），您最多可以同时执行 8 个操作。 Even then the ideal number of regions might be more (say, 16) than the number of cores because it will make contention less likely at a low cost (only 8 additional locks).即便如此，理想的区域数量（例如 16 个）也可能比核心数量多，因为它会以较低的成本（仅 8 个额外的锁）减少争用的可能性。

The benefit from having additional regions eventually diminishes as the number of regions increases relative to your maximum concurrency, which leads to them being a waste of space (memory), as mentioned in the JavaDoc .正如JavaDoc 中提到的那样，随着区域数量相对于最大并发数的增加，拥有额外区域的好处最终会减少，这导致它们浪费空间（内存）。 It's a balance between likelihood of contention (given a lock on one region, what is the probability another thread will attempt to access it) and wasted space.这是争用可能性（给定一个区域上的锁，另一个线程尝试访问它的可能性有多大）和浪费空间之间的平衡。

There are a couple of other factors that will affect performance of a ConcurrentHashMap :还有一些其他因素会影响ConcurrentHashMap性能：

Execution time of locked code - it's good practice to make locked code sections small so that they complete quickly and release their locks.锁定代码的执行时间- 使锁定代码部分变小是一种很好的做法，以便它们快速完成并释放它们的锁。 The more quickly locks are released, the more quickly contention is resolved.锁释放得越快，争用解决得越快。
Distribution of data - Nicely-distributed data tends to perform better under high concurrency.数据分布 - 良好分布的数据往往在高并发下表现更好。 Having all of your data clustered within a single region means that you will always encounter contention.将所有数据聚集在一个区域内意味着您将始终遇到争用。
Data access pattern - Accessing different regions of data at the same time will perform better, as your threads won't be contending for resource locks.数据访问模式 - 同时访问不同区域的数据会表现得更好，因为您的线程不会争用资源锁。 Having nicely-distributed data doesn't matter if you only attempt to access one region at a time.如果您一次只尝试访问一个区域，那么拥有良好分布的数据并不重要。

No matter how many regions there are, all three of those things can positively or negatively affect performance, and can make the number of regions less relevant.无论有多少个区域，所有这三件事都会对性能产生积极或消极的影响，并可能降低区域数量的相关性。 Since they play a big part, they make it less likely that having significantly more regions will help you in general.由于它们发挥着重要作用，因此它们使拥有更多区域的总体上对您有所帮助的可能性降低。 Since you can only execute so many threads at the same time, having threads that quickly complete their work and release their locks is a better focus.由于您只能同时执行这么多线程，因此拥有快速完成工作并释放锁的线程是更好的关注点。

As to your question about the cache: I'm honestly not sure, but I can take a guess.至于您关于缓存的问题：老实说，我不确定，但我可以猜测一下。 When you're using the map heavily those locks will end up on the cache and take up space, potentially bumping out other things which could be more useful.当您大量使用地图时，这些锁最终会出现在缓存上并占用空间，可能会破坏其他可能更有用的东西。 Cache is much more scarce than main memory, and cache misses waste a lot of time.缓存比主存稀缺得多，缓存未命中会浪费大量时间。 I think the idea here is a general aversion to putting lots of things on the cache that don't offer a significant benefit.我认为这里的想法是普遍厌恶将很多东西放在缓存中，而这些东西不会带来显着的好处。 Taken to the extreme: if the cache is filled with locks (somehow) and every data call goes out to memory, you are taking a performance hit.极端情况：如果缓存中充满了锁（不知何故）并且每个数据调用都传到内存中，那么您的性能就会受到影响。

Can we ALWAYS state that if we have 8-core processor we do not need more than 8 locked regions in ConcurrentHashMap?我们是否可以始终声明，如果我们有 8 核处理器，我们在 ConcurrentHashMap 中不需要超过 8 个锁定区域？

No, this is completely wrong.不，这是完全错误的。 It depends on two factors, the number of threads (concurrency) and the number of segment collisions.它取决于两个因素，线程数（并发）和段冲突数。 If two threads compete for the same segment, one thread might block the other.如果两个线程竞争同一段，一个线程可能会阻塞另一个。

While you can have only as much threads owning a core as you have cores, the big mistake with the above statement is to assume that a thread not running on a core can't own a lock.虽然拥有内核的线程数只能与内核数一样多，但上述语句的一个大错误是假设不在内核上运行的线程不能拥有锁。 But a thread owning a lock can still loose the CPU on a task switch for the next thread which then gets blocked when trying to acquire the same lock.但是拥有锁的线程仍然可以在下一个线程的任务切换上释放 CPU，然后在尝试获取相同的锁时被阻塞。

But it's not unusual to adjust the number of threads to the number of cores, especially for computational intense tasks.但是根据核心数调整线程数并不罕见，尤其是对于计算密集型任务。 So the concurrency level of a ConcurrentHashMap depends indirectly on the number of cores in typical setups.因此ConcurrentHashMap的并发级别间接取决于典型设置中的内核数量。

Having a lock for each bucket would imply maintaining a lock state and a waiting queue for each bucket which means quite a lot of resources.每个桶都有一个锁意味着为每个桶维护一个锁状态和一个等待队列，这意味着相当多的资源。 Keep in mind that the lock is only required for concurrent write operations but not for the reading threads.请记住，只有并发写操作需要锁，读线程不需要。

However, for the Java 8 implementation this consideration is obsolete.但是，对于 Java 8 实现，这种考虑已经过时。 It uses a wait-free algorithm for bucket updates, at least for buckets without collisions.它使用无等待算法进行桶更新，至少对于没有冲突的桶是这样。 This is a bit like having a lock per bucket as threads operating on different buckets do not interfere with each other but without the overhead of maintaining a lock state & wait queue.这有点像每个桶都有一个锁，因为在不同桶上运行的线程不会相互干扰，但没有维护锁状态和等待队列的开销。 The only thing to care about is to give the map an appropriate initial size.唯一需要关心的是给地图一个合适的初始大小。 Consequently, the concurrencyLevel , if specified, is used as an initial sizing hint, but otherwise ignored.因此，如果指定了concurrencyLevel ，则将用作初始大小调整提示，否则将被忽略。

Java 8's ConcurrentHashMap does put a lock on each bucket. Java 8 的ConcurrentHashMap确实在每个存储桶上加锁。 There is a lock on write but concurrent reads can happen.写入锁定，但可能发生并发读取。