简体   繁体   中英

Java Collections containsAll Weired Behavior

I have following code , where I am using superList and subList , I want to check that subList is actually a subList of superList.

My objects do not implement hashCode or equals methods. I have created the similar situation in the test. When I run the test then the result show very big performance difference between results from JDK collection and common collections.After Running the test I am getting following output.

Time Lapsed with Java Collection API 8953 MilliSeconds & Result is true Time Lapsed with Commons Collection API 78 MilliSeconds & Result is true

My question is why is java collection , so slow in processing the containsAll operation. Am I doing something wrong there? I have no control over collection Types I am getting that from legacy code. I know if I use HashSet for superList then I would get big performance gains using JDK containsAll operation, but unfortunately that is not possible for me.

package com.mycompany.tests;

import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;

import org.apache.commons.collections.CollectionUtils;
import org.junit.Before;
import org.junit.Test;

public class CollectionComparison_UnitTest {

    private Collection<MyClass> superList = new ArrayList<MyClass>();
    private Collection<MyClass> subList = new HashSet<MyClass>(50000);

    @Before
    public void setUp() throws Exception {

        for (int i = 0; i < 50000; i++) {
            MyClass myClass = new MyClass(i + "A String");
            superList.add(myClass);
        subList.add(myClass);
    }

    @Test
    public void testIt() {
        long startTime = System.currentTimeMillis();
        boolean isSubList = superList.containsAll(subList);
        System.out.println("Time Lapsed with Java Collection API "
                + (System.currentTimeMillis() - startTime)
                + " MilliSeconds & Result is " + isSubList);

        startTime = System.currentTimeMillis();
        isSubList = CollectionUtils.isSubCollection(subList, superList);
        System.out.println("Time Lapsed with Commons Collection API "
                + (System.currentTimeMillis() - startTime)
                + " MilliSeconds & Result is " + isSubList);
    }   
}

class MyClass {
    String myString;

    MyClass(String myString) {
        this.myString = myString;
    }

    String getMyString() {
        return myString;
    }

}

Different algorithms:

ArrayList.containsAll() offers O(N*N) , while CollectionUtils.isSubCollection() offers O(N+N+N) .

You should at least try the tests in the opposite order. Your results may very well just show that the JIT compiler is doing its job well :-)

ArrayList.containsAll is inherited from AbstractCollection.containsAll and is a simple loop checking all elements in row. Each step is a slow linear search. I don't know how CollectionUtils works, but it's not hard to do it much faster then using the simple loop. Converting the second List to a HashSet is a sure win. Sorting both lists and going through them in parallel could be even better.

EDIT:

The CollectionUtils source code makes it clear. They're converting both collections to "cardinality maps", which is a simple and general way for many operations. In some cases it may not be a good idea, eg, when the first list is empty or very short, you in fact loose time. In you case it's a huge win in comparison to AbstractCollection.containsAll, but you could do even better.

Addendum years later

The OP wrote

I know if I use HashSet for superList then I would get big performance gains using JDK containsAll operation, but unfortunately that is not possible for me.

and that's wrong. Classes without hashCode and equals inherit them from Object and can be used with a HashSet and everything works perfectly. Except for that each object is unique, which may be unintended and surprising, but the OP's test superList.containsAll(subList) does exactly the same thing.

So the quick solutions would be

new HashSet<>(superList).containsAll(subList)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM