简体   繁体   中英

Comparing two large lists in java

I have to Array lists with 1000 objects in each of them. I need to remove all elements in Array list 1 which are there in Array list 2. Currently I am running 2 loops which is resulting in 1000 x 1000 operations in worst case.


List<DataClass> dbRows = object1.get("dbData");
List<DataClass> modifiedData = object1.get("dbData");
List<DataClass> dbRowsForLog = object2.get("dbData");
for (DataClass newDbRows : dbRows) {
            boolean found=false;
            for (DataClass oldDbRows : dbRowsForLog) {
                if (newDbRows.equals(oldDbRows)) {
                    found=true;
                    modifiedData.remove(oldDbRows);
                    break;
                }
            }
        }

public class DataClass{
    private int categoryPosition;
    private int subCategoryPosition;
    private Timestamp lastUpdateTime;
    private String lastModifiedUser;
    // + so many other variables 
    
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (o == null || getClass() != o.getClass()) {
            return false;
        }
        DataClass dataClassRow = (DataClass) o;
        return  categoryPosition == dataClassRow.categoryPosition
                && subCategoryPosition == dataClassRow.subCategoryPosition && (lastUpdateTime.compareTo(dataClassRow.lastUpdateTime)==0?true:false)
                && stringComparator(lastModifiedUser,dataClassRow.lastModifiedUser);
    }

    public String toString(){
        return "DataClass[categoryPosition="+categoryPosition+",subCategoryPosition="+subCategoryPosition
                +",lastUpdateTime="+lastUpdateTime+",lastModifiedUser="+lastModifiedUser+"]";
    }
    
    public static boolean stringComparator(String str1, String str2){
         return (str1 == null ? str2 == null : str1.equals(str2));
    }

    public int hashCode() {
    int hash = 7;
    hash = 31 * hash + (int) categoryPosition;
    hash = 31 * hash + (int) subCategoryPosition
    hash = 31 * hash + (lastModifiedUser == null ? 0 : lastModifiedUser.hashCode());
    return hash;
    }
}

The best work around i could think of is create 2 sets of strings by calling tostring() method of DataClass and compare string. It will result in 1000 (for making set1) + 1000 (for making set 2) + 1000 (searching in set ) = 3000 operations. I am stuck in Java 7. Is there any better way to do this? Thanks.

Let Java's builtin collections classes handle most of the optimization for you by taking advantage of a HashSet . The complexity of its contains method is O(1). I would highly recommend looking up how it achieves this because it's very interesting.

List<DataClass> a = object1.get("dbData");
HashSet<DataClass> b = new HashSet<>(object2.get("dbData"));
a.removeAll(b);
return a;

And it's all done for you.

EDIT: caveat

In order for this to work, DataClass needs to implement Object::hashCode . Otherwise, you can't use any of the hash-based collection algorithms.

EDIT 2: implementing hashCode

An object's hash code does not need to change every time an instance variable changes. The hash code only needs to reflect the instance variables that determine equality .

For example, imagine each object had a unique field private final UUID id . In this case, you could determine if two objects were the same by simply testing the id value. Fields like lastUpdateTime and lastModifiedUser would provide information about the object , but two instances with the same id would refer to the same object, even if the lastUpdateTime and lastModifiedUser of each were different .

The point is that if you really want to want to optimize this, include as few fields as possible in the hash computation. From your example, it seems like categoryPosition and subCategoryPosition might be enough.

Whatever fields you choose to include, the simplest way to compute a hash code from them is to use Objects::hash rather than running the numbers yourself.

It is a Set AB operation(only retain elements in Set A that are not in Set B = AB)

If using Set is fine then we can do like below. We can use ArrayList as well in place of Set but in AL case for each element to remove/retain check it needs to go through an entire other list scan.

Set<DataClass> a = new HashSet<>(object1.get("dbData"));
Set<DataClass> b = new HashSet<>(object2.get("dbData"));
a.removeAll(b);

If ordering is needed, use TreeSet.

Try to return a set from object1.get("dbData") and object2.get("dbData") that skips one more intermediate collection creation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM