简体   繁体   English

Java 集合包含所有奇怪的行为

[英]Java Collections containsAll Weired Behavior

I have following code , where I am using superList and subList , I want to check that subList is actually a subList of superList.我有以下代码,我在其中使用 superList 和 subList,我想检查 subList 实际上是 superList 的 subList。

My objects do not implement hashCode or equals methods.我的对象没有实现 hashCode 或 equals 方法。 I have created the similar situation in the test.我在测试中创造了类似的情况。 When I run the test then the result show very big performance difference between results from JDK collection and common collections.After Running the test I am getting following output.当我运行测试时,结果显示 JDK 集合和普通集合的结果之间的性能差异非常大。运行测试后,我得到以下输出。

Time Lapsed with Java Collection API 8953 MilliSeconds & Result is true Time Lapsed with Commons Collection API 78 MilliSeconds & Result is true使用 Java Collection API 的时间流逝 8953 MilliSeconds & 结果为真 使用 Commons Collection API 的时间流逝 78 MilliSeconds & Result 为真

My question is why is java collection , so slow in processing the containsAll operation.我的问题是为什么 java collection 在处理 containsAll 操作时如此缓慢。 Am I doing something wrong there?我在那里做错了吗? I have no control over collection Types I am getting that from legacy code.我无法控制从遗留代码中获得的集合类型。 I know if I use HashSet for superList then I would get big performance gains using JDK containsAll operation, but unfortunately that is not possible for me.我知道如果我对 superList 使用 HashSet 那么我会使用 JDK containsAll 操作获得很大的性能提升,但不幸的是,这对我来说是不可能的。

package com.mycompany.tests;

import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;

import org.apache.commons.collections.CollectionUtils;
import org.junit.Before;
import org.junit.Test;

public class CollectionComparison_UnitTest {

    private Collection<MyClass> superList = new ArrayList<MyClass>();
    private Collection<MyClass> subList = new HashSet<MyClass>(50000);

    @Before
    public void setUp() throws Exception {

        for (int i = 0; i < 50000; i++) {
            MyClass myClass = new MyClass(i + "A String");
            superList.add(myClass);
        subList.add(myClass);
    }

    @Test
    public void testIt() {
        long startTime = System.currentTimeMillis();
        boolean isSubList = superList.containsAll(subList);
        System.out.println("Time Lapsed with Java Collection API "
                + (System.currentTimeMillis() - startTime)
                + " MilliSeconds & Result is " + isSubList);

        startTime = System.currentTimeMillis();
        isSubList = CollectionUtils.isSubCollection(subList, superList);
        System.out.println("Time Lapsed with Commons Collection API "
                + (System.currentTimeMillis() - startTime)
                + " MilliSeconds & Result is " + isSubList);
    }   
}

class MyClass {
    String myString;

    MyClass(String myString) {
        this.myString = myString;
    }

    String getMyString() {
        return myString;
    }

}

Different algorithms:不同的算法:

ArrayList.containsAll() offers O(N*N) , while CollectionUtils.isSubCollection() offers O(N+N+N) . ArrayList.containsAll()提供O(N*N) ,而CollectionUtils.isSubCollection()提供O(N+N+N)

You should at least try the tests in the opposite order.您至少应该以相反的顺序尝试测试。 Your results may very well just show that the JIT compiler is doing its job well :-)你的结果很可能只是表明 JIT 编译器做得很好:-)

ArrayList.containsAll is inherited from AbstractCollection.containsAll and is a simple loop checking all elements in row. ArrayList.containsAll继承自AbstractCollection.containsAll并且是一个简单的循环检查行中的所有元素。 Each step is a slow linear search.每一步都是一个缓慢的线性搜索。 I don't know how CollectionUtils works, but it's not hard to do it much faster then using the simple loop.我不知道CollectionUtils是如何工作的,但是比使用简单循环要快得多并不难。 Converting the second List to a HashSet is a sure win.将第二个 List 转换为HashSet是一个肯定的胜利。 Sorting both lists and going through them in parallel could be even better.对两个列表进行排序并并行处理它们可能会更好。

EDIT:编辑:

The CollectionUtils source code makes it clear. CollectionUtils 源代码清楚地说明了这一点。 They're converting both collections to "cardinality maps", which is a simple and general way for many operations.他们将两个集合都转换为“基数映射”,这是许多操作的一种简单而通用的方法。 In some cases it may not be a good idea, eg, when the first list is empty or very short, you in fact loose time.在某些情况下,这可能不是一个好主意,例如,当第一个列表为空或非常短时,您实际上浪费了时间。 In you case it's a huge win in comparison to AbstractCollection.containsAll, but you could do even better.在你的情况下,与 AbstractCollection.containsAll 相比,这是一个巨大的胜利,但你可以做得更好。

Addendum years later多年后的附录

The OP wrote OP写道

I know if I use HashSet for superList then I would get big performance gains using JDK containsAll operation, but unfortunately that is not possible for me.我知道如果我对 superList 使用 HashSet 那么我会使用 JDK containsAll 操作获得很大的性能提升,但不幸的是,这对我来说是不可能的。

and that's wrong.这是错误的。 Classes without hashCode and equals inherit them from Object and can be used with a HashSet and everything works perfectly.没有hashCodeequals类从Object继承它们,并且可以HashSet一起使用并且一切正常。 Except for that each object is unique, which may be unintended and surprising, but the OP's test superList.containsAll(subList) does exactly the same thing.除了每个对象都是唯一的,这可能是意外和令人惊讶的,但 OP 的测试superList.containsAll(subList)做的完全一样。

So the quick solutions would be所以快速的解决方案是

new HashSet<>(superList).containsAll(subList)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM