ConcurrentHashMap.keySet（）。removeAll（）性能问题

Question

I am reading about ConcurrentHashMap and checked the removeAll() implementation of its key set. 我正在阅读有关ConcurrentHashMap并检查了其键集的removeAll（）实现。 In the current implementation JAVA is iterating the whole key set data structure, even if the given collection contains only one or no elements. 在当前实现中，即使给定集合仅包含一个元素或不包含任何元素，JAVA也会迭代整个键集数据结构。

Actual implementation 实际执行

public final boolean removeAll(Collection<?> c) { 
        if (c == null) throw new NullPointerException(); 
        boolean modified = false; 
        for (Iterator<E> it = iterator(); it.hasNext();) { 
            if (c.contains(it.next())) { 
                it.remove(); 
                modified = true; 
            } 
        } 
        return modified; 
    }

Could some one tell me if this is intended by JAVA developers or i am just over thinking about it 有人可以告诉我这是否是JAVA开发人员想要的，或者我只是在考虑这个问题

Answer 1

Posting expected implementation , but to be very frank i am not actual author of the code below, found this on some coding blog while reading about concurrent hashmap so would like to share with every one else. 发布预期的实现 ，但坦率地说，我不是以下代码的实际作者，因此在阅读并发哈希图时在某个编码博客上发现了这一点，因此希望与其他人共享。

public boolean removeAll(Collection<?> c) { 
        boolean modified = false; 

        if (size() > c.size()) { 
            for (Iterator<?> i = c.iterator(); i.hasNext(); ) 
                modified |= remove(i.next()); 
        } else { 
            for (Iterator<?> i = iterator(); i.hasNext(); ) { 
                if (c.contains(i.next())) { 
                    i.remove(); 
                    modified = true; 
                } 
            } 
        } 
        return modified; 
    }

Answer 2

The KeySet in a regular Map (like HashMap ) implements AbstractSet . 常规Map （例如HashMap ）中的KeySet实现AbstractSet 。 For OpenJDK 8, the source code shows this as the removeAll method: 对于OpenJDK 8，源代码将其显示为removeAll方法：

public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    boolean modified = false;

    if (size() > c.size()) {
        for (Iterator<?> i = c.iterator(); i.hasNext(); )
            modified |= remove(i.next());
    } else {
        for (Iterator<?> i = iterator(); i.hasNext(); ) {
            if (c.contains(i.next())) {
                i.remove();
                modified = true;
            }
        }
    }
    return modified;
}

As you can see, there's a check for which collection has the most entries. 如您所见，将检查哪个集合具有最多的条目。 If the Set itself has a larger size, iteration is done over the given collection argument. 如果Set本身具有较大的大小，则在给定的collection参数上进行迭代。 Otherwise it's done over the entries of the Set itself. 否则，它会在Set本身的条目上完成。 So if the collection you pass in has no entries, zero actual loop executions are performed. 因此，如果您传递的集合没有条目，则将执行零次实际循环执行。 Same if the Set itself has no entries. 如果Set本身没有条目，则相同。

For ConcurrentHashMap , however, the keySet() method returns an instance of internal class KeySetView , which implements another internal class CollectionView . 但是，对于ConcurrentHashMap ， keySet()方法将返回内部类KeySetView的实例，该实例实现了另一个内部类CollectionView 。 The removeAll implementation of that one does conform to the code you posted, which always iterates over the KeySetView entries themselves, not the given collection. 该代码的removeAll实现确实符合您发布的代码，该代码始终在KeySetView条目本身而不是给定的集合上进行迭代。

The reason for this is likely that the Iterator s returned by the views (key set or entry set) allow concurrent access by reflecting the values that are present at the time the iterator is requested. 其原因很可能是视图（键集或条目集）返回的Iterator通过反映在请求Iterator时存在的值而允许并发访问。 From the Javadoc of ConcurrentHashMap : 从ConcurrentHashMap的Javadoc中：

Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. 类似地，迭代器和枚举返回在创建迭代器/枚举时或此后某个时刻反映哈希表状态的元素。 They do not throw ConcurrentModificationException. 他们不抛出ConcurrentModificationException。 However, iterators are designed to be used by only one thread at a time. 但是，迭代器被设计为一次只能由一个线程使用。

So the method forces the use of the key view's Iterator itself to take care of consistency across concurrent actions on the view or map. 因此，该方法强制使用键视图的Iterator本身来照顾视图或地图上并发操作之间的一致性。

Note however that the above implementation for AbstractSet isn't necessarily optimal either. 但是请注意，上述AbstractSet实现也不一定是最佳的。 If the collection c you supply as an argument has a larger size than the set, c.contains(element) is called for every element in the set, but depending on the collection's type that contains method might not be nearly as efficient. 如果作为参数提供的集合c大小大于集合的大小， c.contains(element)为element中的每个element调用c.contains(element) ，但取决于contains方法的集合的类型，效率可能不及该集合。 For an ArrayList , for example, contains runs in linear time, while presence of the object in a set would be detected in constant time. 例如，对于ArrayList ， contains线性运行时间，而对象在集合中的存在将在恒定时间内检测到。

Answer 3

Collection vs. Set 集合与集合

The implementation in question comes from AbstractCollection class, while the implementation in solution comes from AbstractSet which iherits from AbstractCollection . 有问题的实现来自AbstractCollection类，而解决方案的实现来自AbstractSet ，后者继承自AbstractCollection 。

The performance improvement stems from the fact that in Collection you cannot guarantee that the elements are unique, therefore the size() call is not enough to optimise the removal. 性能的提高源于以下事实：在Collection您不能保证元素是唯一的，因此size()调用不足以优化移除。 In Set however uniqueness is guaranteed, therefore additional assumptions and performance improvements can be made, like iterating through a smaller set when establishing which elements to remove. 但是，在Set中，保证了唯一性，因此可以进行其他假设并提高性能，例如在确定要删除的元素时遍历较小的集合。

Highly Specialised Map and its sets 高度专业化的地图及其集合

The story however is entirely different in both key and value sets of ConcurrentHashMap . 但是，在ConcurrentHashMap 键集和值集上，故事完全不同。 This is because ConcurrentHashMap is a highly specialised Map with many assumptions and improvements for concurrency made. 这是因为ConcurrentHashMap是高度专业的Map具有许多假设和对并发性的改进。 This makes any such generic performance improvements like in Set.removeAll() not that valid anymore (in the light of the actual implementation). 这使得Set.removeAll()此类常规性能改进不再有效（根据实际实现）。

ConcurrentHashMap iterator is weakly consistent and does a lot of magic behind the scenes. ConcurrentHashMap迭代器是弱一致性的，在幕后做了很多魔术。 Perhaps removeAll on key set is the price one pays for all other performance (from concurrent access point of view) gains. 也许密钥集上的removeAll是人们为所有其他性能（从并发访问的角度来看）所付出的代价。

Just follow iterator().remove() on ConcurrentHashMap down the rabbit hole of ConcurrentHashMap.replaceNode() , to see how much logic is hinding in the source code of replaceNode() to accommodate removal of elements from the iterator. 只要按照iterator().remove()上ConcurrentHashMap下来的兔孔ConcurrentHashMap.replaceNode()看到多少逻辑中hinding replaceNode（）的源代码，以适应去除迭代器元件。

ConcurrentHashMap.keySet（）。removeAll（）性能问题

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-04-24 10:52:39

解决方案2
2 2017-04-24 11:17:42

解决方案3
2 2017-04-24 11:56:03

Collection vs. Set 集合与集合

Highly Specialised Map and its sets 高度专业化的地图及其集合

ConcurrentHashMap.keySet（）。removeAll（）性能问题

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-04-24 10:52:39

解决方案2 2 2017-04-24 11:17:42

解决方案3 2 2017-04-24 11:56:03

Collection vs. Set 集合与集合

Highly Specialised Map and its sets 高度专业化的地图及其集合

解决方案1
2 已采纳 2017-04-24 10:52:39

解决方案2
2 2017-04-24 11:17:42

解决方案3
2 2017-04-24 11:56:03