简体   繁体   中英

In Java what is the quickest way to check if list contains items from another list, both list are of same type?

Say I have class called MyClass as follow:

public class MyClass
{
     //Identifier is alpha-numeric. If the identifier starts will 'ZZ'
     //is special special identifier.
     private String identifier = null;
     //Date string format YYYY-MM-DD
     private String dateString = null;
     //Just a flag (not important for this scenario)
     private boolean isCoolCat = false;
     //Default Constructor and getters/setters implemented
     //Overrides the standard Java equals() method.
     //This way, when ArrayList calls contains() for MyClass objects
     //it will only check the Date (for ZZ identifier) 
     //and identifier values against each other instead of
     //also comparing the isCoolCat indicator value.
     @Override
     public boolean equals(Object obj)
     {
          if(this == obj)
          {
               return true;
          }
          if(obj == null)
          {
               return false;
          }
          if(getClass() != obj.getClass())
          {
               return false;
          }
          MyClass other = (MyClass) obj;
          if(this.identifier == null)
          {
               if(other.identifier != null)
               {
                    return false;
               }
          } else if(!this.identifier.equals(other.identifier)) {
               return false;
          }
          if(other.identifier.startsWith("ZZ"))
          {
               if(!this.dateString.equals(other.dateString))
               {
                    return false;
               }
          }
          return true;
     }
}

In another class I have two List of MyClass type, each contain 100,000 objects . I need to check if items in one list are in the other list and I currently accomplish this as follow:

`

List<MyClass> inList = new ArrayList<MyClass>();
List<MyClass> outList = new ArrayList<MyClass>();
inList = someMethodForIn();
outList = someMethodForOut();
//For loop iterates through inList and check if outList contains
//MyClass object from inList if it doesn't then it adds it.
for(MyClass inObj : inList)
{
     if(!outList.contains(inObj))
     {
          outList.add(inObj); 
     }
}

My question is: Is this the fastest way to accomplish this? If not can you please show me a better implementation that will give me a performance boost? The list size is not always going to be 100,000. Currently on my platform it takes about 2 minutes for 100,000 size. Say it can vary from 1 to 1,000,000.

You want to use a Set for this. Set has a contains method which can determine if an object is in the set in O(1) time.

A couple things to watch out for when converting from List<MyClass> to Set<MyClass> :

  1. You will lose the ordering of the elements
  2. You will lose the duplicate elements
  3. Your MyClass needs to implement hashcode() and equals() , and they should be consistent .

To convert your List to Set you can just use:

Set<MyObject> s1 = new HashSet<>(inList);
Set<MyObject> s2 = new HashSet<>(outList);

This Java doc explains how to find the union, intersection, and difference of two sets. In particular, it seems like you're interested in the Union:

// transforms s2 into the union of s1 and s2. (The union of two sets 
// is the set containing all of the elements contained in either set.)
s2.addAll(s1)

Hashing ! Hashing is always the answer !

Current complexity of this code is, O(nm) where n is the size of inList and m is the size of outList .

You can use a HashSet to reduce your complexity to O(n) . Because contains will now take O(1)

This can be done like this,

   HashSet<MyClass> outSet = new HashSet<>(outList);
   for(MyClass inObj : inList)
   {
        if(!outSet.contains(inObj))
        {
              outList.add(inObj); 
         }
    }

Credits and Sources.

returning difference between two lists in java

Time complexity of contains(Object o), in an ArrayList of Objects

HashSet.contains performance

2 minutes comparing 2 very large lists, probably not going to get much time savings here, so depending on your application, can you set a flag so that things dependant on this cannot run until finished and push this into it's own thread and let the user do something else (while also telling them this is on-going.) Or at least put up a progress bar. Letting the user know the app is busy and telling them (ish) how long it will take on something only taking a few minutes in a very complex computation like this is OK and probably better than just shaving a few seconds off the time. users are quite tolerant of delays if they know how long they will be and you tell them there is time to go get a coffee.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM