Is there a simple way in Java to get the difference between two collections using a custom equals function without overriding the equals?

Question

I'm open to use a lib. I just want something simple to diff two collections on a different criteria than the normal equals function.

Right now I use something like:

collection1.stream()
           .filter(element -> !collection2.stream()
                                          .anyMatch(element2 -> element2.equalsWithoutSomeField(element)))
           .collect(Collectors.toSet());

and I would like something like:

Collections.diff(collection1, collection2, Foo::equalsWithoutSomeField);

(edit) More context:

Should of mentioned that I'm looking for something that exists already and not to code it myself. I might code a small utils from your ideas if nothing exists.

Also, Real duplicates aren't possible in my case: the collections are Sets. However, duplicates according to the custom equals are possible and should not be removed by this operation. It seems to be a limitation in a lot of possible solutions.

Answer 1

We use similar methods in our project to shorten repetitive collection filtering. We started with some basic building blocks:

static <T> boolean anyMatch(Collection<T> set, Predicate<T> match) {
    for (T object : set)
        if (match.test(object))
            return true;
    return false;
}

Based on this, we can easily implement methods like noneMatch and more complicated ones like isSubset or your diff :

static <E> Collection<E> disjunctiveUnion(Collection<E> c1, Collection<E> c2, BiPredicate<E, E> match)
{
    ArrayList<E> diff = new ArrayList<>();
    diff.addAll(c1);
    diff.addAll(c2);
    diff.removeIf(e -> anyMatch(c1, e1 -> match.test(e, e1)) 
                       && anyMatch(c2, e2 -> match.test(e, e2)));
    return diff;
}

Note that there are for sure some possibilities for perfomance tuning. But keeping it separated into small methods help understanding and using them with ease. Used in code they read quite nice.

You would then use it as you already said:

CollectionUtils.disjunctiveUnion(collection1, collection2, Foo::equalsWithoutSomeField);

Taking Jose Da Silva's suggestion into account, you could even use Comparator to build your criteria on the fly:

Comparator<E> special = Comparator.comparing(Foo::thisField)
                                  .thenComparing(Foo::thatField);
BiPredicate specialMatch = (e1, e2) -> special.compare(e1, e2) == 0;

Answer 2

You can use UnifiedSetWithHashingStrategy from Eclipse Collections . UnifiedSetWithHashingStrategy allows you to create a Set with a custom HashingStrategy . HashingStrategy allows the user to use a custom hashCode() and equals() . The Object's hashCode() and equals() is not used.

Edit based on requirement from OP via comment :

You can use reject() or removeIf() depending on your requirement.

Code Example:

// Common code
Person person1 = new Person("A", "A");
Person person2 = new Person("B", "B");
Person person3 = new Person("C", "A");
Person person4 = new Person("A", "D");
Person person5 = new Person("E", "E");

MutableSet<Person> personSet1 = Sets.mutable.with(person1, person2, person3);
MutableSet<Person> personSet2 = Sets.mutable.with(person2, person4, person5);

HashingStrategy<Person> hashingStrategy =
    HashingStrategies.fromFunction(Person::getLastName);

1) Using reject() : Creates a new Set which contains all the elements which do not satisfy the Predicate .

@Test
public void reject()
{
    MutableSet<Person> personHashingStrategySet = HashingStrategySets.mutable.withAll(
        hashingStrategy, personSet2);

    // reject creates a new copy
    MutableSet<Person> rejectSet = personSet1.reject(personHashingStrategySet::contains);
    Assert.assertEquals(Sets.mutable.with(person1, person3), rejectSet);
}

2) Using removeIf() : Mutates the original Set by removing the elements which satisfy the Predicate .

@Test
public void removeIfTest()
{
    MutableSet<Person> personHashingStrategySet = HashingStrategySets.mutable.withAll(
        hashingStrategy, personSet2);

    // removeIf mutates the personSet1
    personSet1.removeIf(personHashingStrategySet::contains);
    Assert.assertEquals(Sets.mutable.with(person1, person3), personSet1);
}

Answer before requirement from OP via comment: Kept for reference if others might find it useful.

3) Using Sets.differenceInto() API available in Eclipse Collections:

In the code below, set1 and set2 are the two sets which use Person 's equals() and hashCode() . The differenceSet is a UnifiedSetWithHashingStrategy so, it uses the lastNameHashingStrategy to define uniqueness. Hence, even though set2 does not contain person3 however it has the same lastName as person1 the differenceSet contains only person1 .

@Test
public void differenceTest()
{
    MutableSet<Person> differenceSet = Sets.differenceInto(
        HashingStrategySets.mutable.with(hashingStrategy), 
        set1, 
        set2);

    Assert.assertEquals(Sets.mutable.with(person1), differenceSet);
}

Person class common to both code blocks:

public class Person
{
    private final String firstName;
    private final String lastName;

    public Person(String firstName, String lastName)
    {
        this.firstName = firstName;
        this.lastName = lastName;
    }

    public String getFirstName()
    {
        return firstName;
    }

    public String getLastName()
    {
        return lastName;
    }

    @Override
    public boolean equals(Object o)
    {
        if (this == o)
        {
            return true;
        }
        if (o == null || getClass() != o.getClass())
        {
            return false;
        }
        Person person = (Person) o;
        return Objects.equals(firstName, person.firstName) &&
                Objects.equals(lastName, person.lastName);
    }

    @Override
    public int hashCode()
    {
        return Objects.hash(firstName, lastName);
    }
}

Javadocs: MutableSet , UnifiedSet , UnifiedSetWithHashingStrategy , HashingStrategy , Sets , reject , removeIf

Note: I am a committer on Eclipse Collections

Answer 3

Comparing

You can achieve this without the use of any library, just using java's Comparator

For instance, with the following object

public class A {
    private String a;
    private Double b;
    private String c;
    private int d;
    // getters and setters
}

You can use a comparator like

Comparator<AA> comparator = Comparator.comparing(AA::getA)
        .thenComparing(AA::getB)
        .thenComparingInt(AA::getD);

This compares the fields a , b and the int d , skipping c .

The only problem here is that this won't work with null values.

Comparing nulls

One possible solution to do a fine grained configuration, that is allow to check for specific null fields is using a Comparator class similar to:

// Comparator for properties only, only writed to be used with Comparator#comparing
public final class PropertyNullComparator<T extends Comparable<? super T>> 
                                            implements Comparator<Object> {
    private PropertyNullComparator() {  }
    public static <T extends Comparable<? super T>> PropertyNullComparator<T> of() {
        return new PropertyNullComparator<>();
    }
    @Override
    public int compare(Object o1, Object o2) {
        if (o1 != null && o2 != null) {
            if (o1 instanceof Comparable) {
                @SuppressWarnings({ "unchecked" })
                Comparable<Object> comparable = (Comparable<Object>) o1;
                return comparable.compareTo(o2);
            } else {
                // this will throw a ccn exception when object is not comparable
                @SuppressWarnings({ "unchecked" })
                Comparable<Object> comparable = (Comparable<Object>) o2;
                return comparable.compareTo(o1) * -1; // * -1 to keep order
            }
        } else {
            return o1 == o2 ? 0 : (o1 == null ? -1 : 1); // nulls first
        }
    }
}

This way you can use a comparator specifying the allowed null fields.

Comparator<AA> comparator = Comparator.comparing(AA::getA)
        .thenComparing(AA::getB, PropertyNullComparator.of())
        .thenComparingInt(AA::getD);

If you don't want to define a custom comparator you can use something like:

Comparator<AA> comparator = Comparator.comparing(AA::getA)
        .thenComparing(AA::getB, Comparator.nullsFirst(Comparator.naturalOrder()))
        .thenComparingInt(AA::getD);

Difference method

The difference (A - B) method could be implemented using two TreeSets .

static <T> TreeSet<T> difference(Collection<T> c1, 
                                 Collection<T> c2, 
                                 Comparator<T> comparator) {
    TreeSet<T> treeSet1 = new TreeSet<>(comparator); treeSet1.addAll(c1);
    if (treeSet1.size() > c2.size()) {
        treeSet1.removeAll(c2);
    } else {
        TreeSet<T> treeSet2 = new TreeSet<>(comparator); treeSet2.addAll(c2);
        treeSet1.removeAll(treeSet2);
    }
    return treeSet1;
}

note: a TreeSet makes sense to be used since we are talking of uniqueness with a specific comparator. Also could perform better, the contains method of TreeSet is O(log(n)) , compared to a common ArrayList that is O(n) .

Why only a TreeSet is used when treeSet1.size() > c2.size() , this is because when the condition is not met, the TreeSet#removeAll , uses the contains method of the second collection, this second collection could be any java collection and its contains method its not guaranteed to work exactly the same as the contains of the first TreeSet (with custom comparator).

Edit (Given the more context of the question)

Since collection1 is a set that could contains repeated elements acording to the custom equals (not the equals of the object) the solution already provided in the question could be used, since it does exactly that, without modifying any of the input collections and creating a new output set.

So you can create your own static function (because at least i am not aware of a library that provides a similar method), and use the Comparator or a BiPredicate .

static <T> Set<T> difference(Collection<T> collection1, 
                             Collection<T> collection2, 
                             Comparator<T> comparator) {
    collection1.stream()
            .filter(element1 -> !collection2.stream()
                    .anyMatch(element2 -> comparator.compare(element1, element2) == 0))
            .collect(Collectors.toSet());
}

Edit (To Eugene)

"Why would you want to implement a null safe comparator yourself"

At least to my knowledge there isn't a comparator to compare fields when this are a simple and common null, the closest that i know of is (to raplace my sugested PropertyNullComparator.of() [clearer/shorter/better name can be used]):

Comparator.nullsFirst(Comparator.naturalOrder())

So you would have to write that line for every field that you want to compare. Is this doable?, of course it is, is it practical?, i think not.

Easy solution, create a helper method.

static class  ComparatorUtils {
    public static <T extends Comparable<? super T>> Comparator<T> shnp() { // super short null comparator
        return Comparator.nullsFirst(Comparator.<T>naturalOrder());
    }
}

Do this work?, yes this works, is it practical?, it looks like, is it a great solution? well that depends, many people consider the exaggerated (and/or unnecessary) use of helper methods as an anti-pattern, (a good old article by Nick Malik ). There are some reasons listed there, but to make things short, this is an OO language, so OO solutions are normally preferred to static helper methods.

"As stated in the documentation : Note that the ordering maintained by a set (whether or not an explicit comparator is provided must be consistent with equals if it is to correctly implement the Set interface. Further, the same problem would arise in the other case, when size() > c.size() because ultimately this would still call equals in the remove method. So they both have to implement Comparator and equals consistently for this to work correctly"

The javadoc says of TreeSet the following, but with a clear if:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface

Then says this:

See Comparable or Comparator for a precise definition of consistent with equals

If you go to the Comparable javadoc says:

It is strongly recommended (though not required) that natural orderings be consistent with equals

If we continue to read the javadoc again from Comparable (even in the same paragraph) says the following:

This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all key comparisons using its compareTo (or compare ) method, so two keys that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

By this last quote and with a very simple code debug, or even a reading, you can see the use of an internal TreeMap , and that all its derivated methods are based on the comparator , not the equals method;

"Why is this so implemented? because there is a difference when removing many elements from a little set and the other way around, as a matter of fact same stands for addAll"

If you go to the definition of removeAll you can see that its implementation is in AbstractSet , it is not overrided. And this implementation uses a contains from the argument collection when this is larger, the beavior of this contains is uncertain, it isn't necessary (nor probable) that the received collection (eg list, queue, etc) has/can define the same comparator.

Update 1: This jdk bug is being discussed (and considerated to be fixed) in here https://bugs.openjdk.java.net/browse/JDK-6394757

Answer 4

static <T> Collection<T> diff(Collection<T> minuend, Collection<T> subtrahend, BiPredicate<T, T> equals) {
    Set<Wrapper<T>> w1 = minuend.stream().map(item -> new Wrapper<>(item, equals)).collect(Collectors.toSet());
    Set<Wrapper<T>> w2 = subtrahend.stream().map(item -> new Wrapper<>(item, equals)).collect(Collectors.toSet());
    w1.removeAll(w2);
    return w1.stream().map(w -> w.item).collect(Collectors.toList());
}

static class Wrapper<T> {
    T item;
    BiPredicate<T, T> equals;

    Wrapper(T item, BiPredicate<T, T> equals) {
        this.item = item;
        this.equals = equals;
    }

    @Override
    public int hashCode() {
        // all items have same hash code, check equals
        return 1;
    }

    @Override
    public boolean equals(Object that) {
        return equals.test(this.item, ((Wrapper<T>) that).item);
    }
}

Answer 5

pom.xml:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-collections4</artifactId>
    <version>4.4</version>
</dependency>

code/test:

package com.my;

import lombok.Builder;
import lombok.Getter;
import lombok.ToString;
import org.apache.commons.collections4.CollectionUtils;
import org.apache.commons.collections4.Equator;

import java.util.Collection;
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;
import java.util.function.Function;

public class Diff {

    public static class FieldEquator<T> implements Equator<T> {
        private final Function<T, Object>[] functions;

        @SafeVarargs
        public FieldEquator(Function<T, Object>... functions) {
            if (Objects.isNull(functions) || functions.length < 1) {
                throw new UnsupportedOperationException();
            }
            this.functions = functions;
        }

        @Override
        public boolean equate(T o1, T o2) {
            if (Objects.isNull(o1) && Objects.isNull(o2)) {
                return true;
            }
            if (Objects.isNull(o1) || Objects.isNull(o2)) {
                return false;
            }
            for (Function<T, ?> function : functions) {
                if (!Objects.equals(function.apply(o1), function.apply(o2))) {
                    return false;
                }
            }
            return true;
        }

        @Override
        public int hash(T o) {
            if (Objects.isNull(o)) {
                return -1;
            }
            int i = 0;
            Object[] vals = new Object[functions.length];
            for (Function<T, Object> function : functions) {
                vals[i] = function.apply(o);
                i++;
            }
            return Objects.hash(vals);
        }
    }

    @SafeVarargs
    private static <T> Set<T> difference(Collection<T> a, Collection<T> b, Function<T, Object>... functions) {
        if ((Objects.isNull(a) || a.isEmpty()) && Objects.nonNull(b) && !b.isEmpty()) {
            return new HashSet<>(b);
        } else if ((Objects.isNull(b) || b.isEmpty()) && Objects.nonNull(a) && !a.isEmpty()) {
            return new HashSet<>(a);
        }

        Equator<T> eq = new FieldEquator<>(functions);

        Collection<T> res = CollectionUtils.removeAll(a, b, eq);
        res.addAll(CollectionUtils.removeAll(b, a, eq));

        return new HashSet<>(res);
    }

    /**
     * Test
     */

    @Builder
    @Getter
    @ToString
    public static class A {
        String a;
        String b;
        String c;
    }

    public static void main(String[] args) {
        Set<A> as1 = new HashSet<>();
        Set<A> as2 = new HashSet<>();

        A a1 = A.builder().a("1").b("1").c("1").build();
        A a2 = A.builder().a("1").b("1").c("2").build();
        A a3 = A.builder().a("2").b("1").c("1").build();
        A a4 = A.builder().a("1").b("3").c("1").build();
        A a5 = A.builder().a("1").b("1").c("1").build();
        A a6 = A.builder().a("1").b("1").c("2").build();
        A a7 = A.builder().a("1").b("1").c("6").build();

        as1.add(a1);
        as1.add(a2);
        as1.add(a3);

        as2.add(a4);
        as2.add(a5);
        as2.add(a6);
        as2.add(a7);

        System.out.println("Set1: " + as1);
        System.out.println("Set2: " + as2);

        // Check A::getA, A::getB ignore A::getC
        Collection<A> difference = difference(as1, as2, A::getA, A::getB);

        System.out.println("Diff: " + difference);
    }
}

result:

Set1: [Diff.A(a=2, b=1, c=1), Diff.A(a=1, b=1, c=1), Diff.A(a=1, b=1, c=2)]
Set2: [Diff.A(a=1, b=1, c=6), Diff.A(a=1, b=1, c=2), Diff.A(a=1, b=3, c=1), Diff.A(a=1, b=1, c=1)]
Diff: [Diff.A(a=1, b=3, c=1), Diff.A(a=2, b=1, c=1)]

Is there a simple way in Java to get the difference between two collections using a custom equals function without overriding the equals?

Question

5 answers

solution1
4 2018-03-16 16:10:10

solution2
3 ACCPTED 2018-03-17 03:31:16

solution3
0 2018-03-16 15:47:16

solution4
0 2018-03-16 18:43:07

solution5
0 2022-09-12 15:35:07

Is there a simple way in Java to get the difference between two collections using a custom equals function without overriding the equals?

Question

5 answers

solution1 4 2018-03-16 16:10:10

solution2 3 ACCPTED 2018-03-17 03:31:16

solution3 0 2018-03-16 15:47:16

solution4 0 2018-03-16 18:43:07

solution5 0 2022-09-12 15:35:07

solution1
4 2018-03-16 16:10:10

solution2
3 ACCPTED 2018-03-17 03:31:16

solution3
0 2018-03-16 15:47:16

solution4
0 2018-03-16 18:43:07

solution5
0 2022-09-12 15:35:07