简体繁体 English

为什么ConcurrentHashMap不将地图的大小存储在AtomicInteger中？

[英]Why does ConcurrentHashMap not store the size of the map in a AtomicInteger?

原文 2015-07-29 12:40:49 5 3 java/ multithreading/ atomic/ volatile/ concurrenthashmap

In the JavaDoc for the size() method in ConcurrentHashMap it states: 在ConcurrentHashMap的size()方法的JavaDoc中，它指出：

"Bear in mind that the results of aggregate status methods including size , isEmpty , and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads." “记住，骨料状态方法，包括结果size ， isEmpty ，和containsValue通常仅在一个地图没有发生在其他线程并发更新有用的。”

I know that in most concurrent applications maps are essentially moving targets therefore it is not that common to need the value returned by size() method to be guaranteed to be correct. 我知道，在大多数并发应用程序中，地图本质上是移动目标，因此，通常需要由size()方法返回的值被保证是正确的。 But in my case I actually do need it to be correct(not a stale value). 但就我而言，我实际上确实需要它正确（不是陈旧的值）。

Seen as I really do not want to lock the whole table for each size() call, my question is why does ConcurrentHashMap not just store the size of the table in an AtomicInteger field, which gets updated as part of a put() or remove() operation? 似乎我真的不想为每个size()调用锁定整个表，我的问题是为什么ConcurrentHashMap不仅将表的大小存储在AtomicInteger字段中，该字段作为put()一部分进行更新或remove()操作？ If it did then the write to the size field would always be atomic and (due to the fact that AtomicInteger stores its value in a volatile field) then any reads from the size field would be return the most recently updated version of the field... 如果这样做，那么对size字段的写入将始终是原子的（并且由于AtomicInteger将其值存储在volatile字段中），那么从size字段进行的任何读取都将返回该字段的最新版本。。

Or am I missing something here? 还是我在这里想念什么？

3 个解决方案

Yes, you are missing something. 是的，您缺少一些东西。 Imagine 10 threads performing updates, and the eleventh regularly determining the size of the map. 想象一下有10个线程执行更新，第11个线程定期确定地图的大小。

The numbers the 11th thread reads totally depends on the execution timing of the 10 other threads. 第11个线程读取的数字完全取决于其他10个线程的执行时间。 Using an atomic integer would not help here. 在这里使用原子整数将无济于事。

As ruediste explained , the size information is pointless, if you have ongoing concurrent updates. 如ruediste所述，如果正在进行并发更新，则大小信息毫无意义。 You may get the old value, the new value, or an arbitrary in-between value in case of multiple updates. 如果进行多次更新，则可以获取旧值，新值或任意中间值。

This has nothing to do with the question whether the value is stored in an atomic integer or not. 这与值是否存储在原子整数中无关。 But the whole point about ConcurrentHashMap is to allow concurrent updates. 但是，关于ConcurrentHashMap的全部要点是允许并发更新。

“Concurrent” means there is no time relationship nor order of actions. “并发”表示没有时间关系，也没有动作顺序。 When a thread attempts to insert a value while another one attempts to remove a value, there is a time relationship only if they are accessing the same key. 当一个线程尝试插入一个值而另一个线程尝试删除一个值时， 只有当他们访问相同的键时才存在时间关系。 In that case the removing thread can use the result to tell whether an insertion happened before the removal or not. 在这种情况下，删除线程可以使用结果来判断插入是否发生在删除之前。 In all other cases, the threads act independently. 在所有其他情况下，线程独立运行。

When querying the size at the same time you may get the old size, the old size plus one or the old size minus one (in case the initial state wasn't empty). 同时查询尺寸时，您可能会得到旧尺寸，旧尺寸加一或旧尺寸减一（以防初始状态不为空）。 If you get a value different from the old size, you may use it to deduce whether the insertion or the removal happened first, and come to a wrong conclusion. 如果您得到的值与以前的大小不同，则可以用它来推断是先进行插入还是先进行删除，然后得出错误的结论。 A different thread using containsKey at the same time to detect if either key is present might perceive a different order of these actions. 同时使用containsKey的另一个线程来检测是否存在任何一个密钥可能会感知这些操作的顺序不同。 That's what “no ordering” or “no time relationship” implies. 这就是“无序”或“无时间关系”的含义。

Maintaining an atomic integer size value wouldn't change anything. 保持原子整数大小值不会有任何改变。 There is still a point where a thread has modified the map's content but not the size field yet. 仍然有一个地方线程修改了地图的内容，但还没有修改大小字段。 And there might be an arbitrary number of threads right at this point. 此时，可能有任意数量的线程。 But forcing all threads to update a single atomic integer would reduce concurrency drastically. 但是，强制所有线程更新单个原子整数将大大减少并发性。 While an atomic integer is faster than a synchronized block, it's still a kind of synchronization action which suffers from high contention. 尽管一个原子整数比synchronized块快，但它仍然是一种同步动作，遭受高竞争。 So it would reduce performance without solving any problem. 因此，它将降低性能而不会解决任何问题。

Fact #1: Updating a single size field has the potential to be a concurrency bottleneck, whether it is implemented as a volatile int , an AtomicInteger or something else. 事实1：更新单个size字段可能会成为并发瓶颈，无论是将其实现为volatile int ， AtomicInteger还是其他方式。

Fact #2: In the vast majority of use-cases for ConcurrentHashMap , the size() method is not useful to the application because it only tells you the size at a particular instant. 事实2：在ConcurrentHashMap的绝大多数用例中， size()方法对应用程序无用，因为它只告诉您特定时刻的大小。 (If that. The javadoc does not actually specify what size() will return in the face of concurrent updates. ) （如果那样的话。javadoc实际上并没有指定面对并发更新时将返回什么size() 。）

If we assume that size() is unlikely to be called, then introducing a potential concurrency bottleneck into update operations in case it is called seems like a bad idea to me. 如果我们假设不太可能调用size() ，那么对更新操作引入潜在的并发瓶颈，以防万一它被调用对我来说似乎是一个坏主意。 I'm guessing that the author (Doug Lea) had similar thoughts. 我猜想作者（Doug Lea）有类似的想法。