I'm open to use a lib. I just want something simple to diff two collections on a different criteria than the normal equals function.
Right now I use something like:
collection1.stream()
.filter(element -> !collection2.stream()
.anyMatch(element2 -> element2.equalsWithoutSomeField(element)))
.collect(Collectors.toSet());
and I would like something like:
Collections.diff(collection1, collection2, Foo::equalsWithoutSomeField);
(edit) More context:
Should of mentioned that I'm looking for something that exists already and not to code it myself. I might code a small utils from your ideas if nothing exists.
Also, Real duplicates aren't possible in my case: the collections are Sets. However, duplicates according to the custom equals are possible and should not be removed by this operation. It seems to be a limitation in a lot of possible solutions.
We use similar methods in our project to shorten repetitive collection filtering. We started with some basic building blocks:
static <T> boolean anyMatch(Collection<T> set, Predicate<T> match) {
for (T object : set)
if (match.test(object))
return true;
return false;
}
Based on this, we can easily implement methods like noneMatch
and more complicated ones like isSubset
or your diff
:
static <E> Collection<E> disjunctiveUnion(Collection<E> c1, Collection<E> c2, BiPredicate<E, E> match)
{
ArrayList<E> diff = new ArrayList<>();
diff.addAll(c1);
diff.addAll(c2);
diff.removeIf(e -> anyMatch(c1, e1 -> match.test(e, e1))
&& anyMatch(c2, e2 -> match.test(e, e2)));
return diff;
}
Note that there are for sure some possibilities for perfomance tuning. But keeping it separated into small methods help understanding and using them with ease. Used in code they read quite nice.
You would then use it as you already said:
CollectionUtils.disjunctiveUnion(collection1, collection2, Foo::equalsWithoutSomeField);
Taking Jose Da Silva's suggestion into account, you could even use Comparator
to build your criteria on the fly:
Comparator<E> special = Comparator.comparing(Foo::thisField)
.thenComparing(Foo::thatField);
BiPredicate specialMatch = (e1, e2) -> special.compare(e1, e2) == 0;
You can use UnifiedSetWithHashingStrategy
from Eclipse Collections . UnifiedSetWithHashingStrategy
allows you to create a Set with a custom HashingStrategy
. HashingStrategy
allows the user to use a custom hashCode()
and equals()
. The Object's hashCode()
and equals()
is not used.
Edit based on requirement from OP via comment :
You can use reject()
or removeIf()
depending on your requirement.
Code Example:
// Common code
Person person1 = new Person("A", "A");
Person person2 = new Person("B", "B");
Person person3 = new Person("C", "A");
Person person4 = new Person("A", "D");
Person person5 = new Person("E", "E");
MutableSet<Person> personSet1 = Sets.mutable.with(person1, person2, person3);
MutableSet<Person> personSet2 = Sets.mutable.with(person2, person4, person5);
HashingStrategy<Person> hashingStrategy =
HashingStrategies.fromFunction(Person::getLastName);
1) Using reject()
: Creates a new Set
which contains all the elements which do not satisfy the Predicate
.
@Test
public void reject()
{
MutableSet<Person> personHashingStrategySet = HashingStrategySets.mutable.withAll(
hashingStrategy, personSet2);
// reject creates a new copy
MutableSet<Person> rejectSet = personSet1.reject(personHashingStrategySet::contains);
Assert.assertEquals(Sets.mutable.with(person1, person3), rejectSet);
}
2) Using removeIf()
: Mutates the original Set
by removing the elements which satisfy the Predicate
.
@Test
public void removeIfTest()
{
MutableSet<Person> personHashingStrategySet = HashingStrategySets.mutable.withAll(
hashingStrategy, personSet2);
// removeIf mutates the personSet1
personSet1.removeIf(personHashingStrategySet::contains);
Assert.assertEquals(Sets.mutable.with(person1, person3), personSet1);
}
Answer before requirement from OP via comment: Kept for reference if others might find it useful.
3) Using Sets.differenceInto()
API available in Eclipse Collections:
In the code below, set1
and set2
are the two sets which use Person
's equals()
and hashCode()
. The differenceSet
is a UnifiedSetWithHashingStrategy
so, it uses the lastNameHashingStrategy
to define uniqueness. Hence, even though set2
does not contain person3
however it has the same lastName as person1
the differenceSet
contains only person1
.
@Test
public void differenceTest()
{
MutableSet<Person> differenceSet = Sets.differenceInto(
HashingStrategySets.mutable.with(hashingStrategy),
set1,
set2);
Assert.assertEquals(Sets.mutable.with(person1), differenceSet);
}
Person class common to both code blocks:
public class Person
{
private final String firstName;
private final String lastName;
public Person(String firstName, String lastName)
{
this.firstName = firstName;
this.lastName = lastName;
}
public String getFirstName()
{
return firstName;
}
public String getLastName()
{
return lastName;
}
@Override
public boolean equals(Object o)
{
if (this == o)
{
return true;
}
if (o == null || getClass() != o.getClass())
{
return false;
}
Person person = (Person) o;
return Objects.equals(firstName, person.firstName) &&
Objects.equals(lastName, person.lastName);
}
@Override
public int hashCode()
{
return Objects.hash(firstName, lastName);
}
}
Javadocs: MutableSet , UnifiedSet , UnifiedSetWithHashingStrategy , HashingStrategy , Sets , reject , removeIf
Note: I am a committer on Eclipse Collections
Comparing
You can achieve this without the use of any library, just using java's Comparator
For instance, with the following object
public class A {
private String a;
private Double b;
private String c;
private int d;
// getters and setters
}
You can use a comparator like
Comparator<AA> comparator = Comparator.comparing(AA::getA)
.thenComparing(AA::getB)
.thenComparingInt(AA::getD);
This compares the fields a
, b
and the int d
, skipping c
.
The only problem here is that this won't work with null values.
Comparing nulls
One possible solution to do a fine grained configuration, that is allow to check for specific null fields is using a Comparator
class similar to:
// Comparator for properties only, only writed to be used with Comparator#comparing
public final class PropertyNullComparator<T extends Comparable<? super T>>
implements Comparator<Object> {
private PropertyNullComparator() { }
public static <T extends Comparable<? super T>> PropertyNullComparator<T> of() {
return new PropertyNullComparator<>();
}
@Override
public int compare(Object o1, Object o2) {
if (o1 != null && o2 != null) {
if (o1 instanceof Comparable) {
@SuppressWarnings({ "unchecked" })
Comparable<Object> comparable = (Comparable<Object>) o1;
return comparable.compareTo(o2);
} else {
// this will throw a ccn exception when object is not comparable
@SuppressWarnings({ "unchecked" })
Comparable<Object> comparable = (Comparable<Object>) o2;
return comparable.compareTo(o1) * -1; // * -1 to keep order
}
} else {
return o1 == o2 ? 0 : (o1 == null ? -1 : 1); // nulls first
}
}
}
This way you can use a comparator specifying the allowed null fields.
Comparator<AA> comparator = Comparator.comparing(AA::getA)
.thenComparing(AA::getB, PropertyNullComparator.of())
.thenComparingInt(AA::getD);
If you don't want to define a custom comparator you can use something like:
Comparator<AA> comparator = Comparator.comparing(AA::getA)
.thenComparing(AA::getB, Comparator.nullsFirst(Comparator.naturalOrder()))
.thenComparingInt(AA::getD);
Difference method
The difference (A - B) method could be implemented using two TreeSets
.
static <T> TreeSet<T> difference(Collection<T> c1,
Collection<T> c2,
Comparator<T> comparator) {
TreeSet<T> treeSet1 = new TreeSet<>(comparator); treeSet1.addAll(c1);
if (treeSet1.size() > c2.size()) {
treeSet1.removeAll(c2);
} else {
TreeSet<T> treeSet2 = new TreeSet<>(comparator); treeSet2.addAll(c2);
treeSet1.removeAll(treeSet2);
}
return treeSet1;
}
note: a TreeSet
makes sense to be used since we are talking of uniqueness with a specific comparator. Also could perform better, the contains
method of TreeSet
is O(log(n))
, compared to a common ArrayList
that is O(n)
.
Why only a TreeSet
is used when treeSet1.size() > c2.size()
, this is because when the condition is not met, the TreeSet#removeAll
, uses the contains
method of the second collection, this second collection could be any java collection and its contains
method its not guaranteed to work exactly the same as the contains
of the first TreeSet
(with custom comparator).
Edit (Given the more context of the question)
Since collection1 is a set that could contains repeated elements acording to the custom equals
(not the equals
of the object) the solution already provided in the question could be used, since it does exactly that, without modifying any of the input collections and creating a new output set.
So you can create your own static function (because at least i am not aware of a library that provides a similar method), and use the Comparator
or a BiPredicate
.
static <T> Set<T> difference(Collection<T> collection1,
Collection<T> collection2,
Comparator<T> comparator) {
collection1.stream()
.filter(element1 -> !collection2.stream()
.anyMatch(element2 -> comparator.compare(element1, element2) == 0))
.collect(Collectors.toSet());
}
Edit (To Eugene)
"Why would you want to implement a null safe comparator yourself"
At least to my knowledge there isn't a comparator to compare fields when this are a simple and common null, the closest that i know of is (to raplace my sugested PropertyNullComparator.of()
[clearer/shorter/better name can be used]):
Comparator.nullsFirst(Comparator.naturalOrder())
So you would have to write that line for every field that you want to compare. Is this doable?, of course it is, is it practical?, i think not.
Easy solution, create a helper method.
static class ComparatorUtils {
public static <T extends Comparable<? super T>> Comparator<T> shnp() { // super short null comparator
return Comparator.nullsFirst(Comparator.<T>naturalOrder());
}
}
Do this work?, yes this works, is it practical?, it looks like, is it a great solution? well that depends, many people consider the exaggerated (and/or unnecessary) use of helper methods as an anti-pattern, (a good old article by Nick Malik ). There are some reasons listed there, but to make things short, this is an OO language, so OO solutions are normally preferred to static helper methods.
"As stated in the documentation : Note that the ordering maintained by a set (whether or not an explicit comparator is provided must be consistent with equals if it is to correctly implement the Set interface. Further, the same problem would arise in the other case, when size() > c.size() because ultimately this would still call equals in the remove method. So they both have to implement Comparator and equals consistently for this to work correctly"
The javadoc says of TreeSet the following, but with a clear if:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface
Then says this:
See Comparable or Comparator for a precise definition of consistent with equals
If you go to the Comparable javadoc says:
It is strongly recommended (though not required) that natural orderings be consistent with equals
If we continue to read the javadoc again from Comparable (even in the same paragraph) says the following:
This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all key comparisons using its compareTo (or compare ) method, so two keys that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
By this last quote and with a very simple code debug, or even a reading, you can see the use of an internal TreeMap , and that all its derivated methods are based on the comparator
, not the equals
method;
"Why is this so implemented? because there is a difference when removing many elements from a little set and the other way around, as a matter of fact same stands for addAll"
If you go to the definition of removeAll
you can see that its implementation is in AbstractSet
, it is not overrided. And this implementation uses a contains
from the argument collection when this is larger, the beavior of this contains
is uncertain, it isn't necessary (nor probable) that the received collection (eg list, queue, etc) has/can define the same comparator.
Update 1: This jdk bug is being discussed (and considerated to be fixed) in here https://bugs.openjdk.java.net/browse/JDK-6394757
static <T> Collection<T> diff(Collection<T> minuend, Collection<T> subtrahend, BiPredicate<T, T> equals) {
Set<Wrapper<T>> w1 = minuend.stream().map(item -> new Wrapper<>(item, equals)).collect(Collectors.toSet());
Set<Wrapper<T>> w2 = subtrahend.stream().map(item -> new Wrapper<>(item, equals)).collect(Collectors.toSet());
w1.removeAll(w2);
return w1.stream().map(w -> w.item).collect(Collectors.toList());
}
static class Wrapper<T> {
T item;
BiPredicate<T, T> equals;
Wrapper(T item, BiPredicate<T, T> equals) {
this.item = item;
this.equals = equals;
}
@Override
public int hashCode() {
// all items have same hash code, check equals
return 1;
}
@Override
public boolean equals(Object that) {
return equals.test(this.item, ((Wrapper<T>) that).item);
}
}
pom.xml:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-collections4</artifactId>
<version>4.4</version>
</dependency>
code/test:
package com.my;
import lombok.Builder;
import lombok.Getter;
import lombok.ToString;
import org.apache.commons.collections4.CollectionUtils;
import org.apache.commons.collections4.Equator;
import java.util.Collection;
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;
import java.util.function.Function;
public class Diff {
public static class FieldEquator<T> implements Equator<T> {
private final Function<T, Object>[] functions;
@SafeVarargs
public FieldEquator(Function<T, Object>... functions) {
if (Objects.isNull(functions) || functions.length < 1) {
throw new UnsupportedOperationException();
}
this.functions = functions;
}
@Override
public boolean equate(T o1, T o2) {
if (Objects.isNull(o1) && Objects.isNull(o2)) {
return true;
}
if (Objects.isNull(o1) || Objects.isNull(o2)) {
return false;
}
for (Function<T, ?> function : functions) {
if (!Objects.equals(function.apply(o1), function.apply(o2))) {
return false;
}
}
return true;
}
@Override
public int hash(T o) {
if (Objects.isNull(o)) {
return -1;
}
int i = 0;
Object[] vals = new Object[functions.length];
for (Function<T, Object> function : functions) {
vals[i] = function.apply(o);
i++;
}
return Objects.hash(vals);
}
}
@SafeVarargs
private static <T> Set<T> difference(Collection<T> a, Collection<T> b, Function<T, Object>... functions) {
if ((Objects.isNull(a) || a.isEmpty()) && Objects.nonNull(b) && !b.isEmpty()) {
return new HashSet<>(b);
} else if ((Objects.isNull(b) || b.isEmpty()) && Objects.nonNull(a) && !a.isEmpty()) {
return new HashSet<>(a);
}
Equator<T> eq = new FieldEquator<>(functions);
Collection<T> res = CollectionUtils.removeAll(a, b, eq);
res.addAll(CollectionUtils.removeAll(b, a, eq));
return new HashSet<>(res);
}
/**
* Test
*/
@Builder
@Getter
@ToString
public static class A {
String a;
String b;
String c;
}
public static void main(String[] args) {
Set<A> as1 = new HashSet<>();
Set<A> as2 = new HashSet<>();
A a1 = A.builder().a("1").b("1").c("1").build();
A a2 = A.builder().a("1").b("1").c("2").build();
A a3 = A.builder().a("2").b("1").c("1").build();
A a4 = A.builder().a("1").b("3").c("1").build();
A a5 = A.builder().a("1").b("1").c("1").build();
A a6 = A.builder().a("1").b("1").c("2").build();
A a7 = A.builder().a("1").b("1").c("6").build();
as1.add(a1);
as1.add(a2);
as1.add(a3);
as2.add(a4);
as2.add(a5);
as2.add(a6);
as2.add(a7);
System.out.println("Set1: " + as1);
System.out.println("Set2: " + as2);
// Check A::getA, A::getB ignore A::getC
Collection<A> difference = difference(as1, as2, A::getA, A::getB);
System.out.println("Diff: " + difference);
}
}
result:
Set1: [Diff.A(a=2, b=1, c=1), Diff.A(a=1, b=1, c=1), Diff.A(a=1, b=1, c=2)]
Set2: [Diff.A(a=1, b=1, c=6), Diff.A(a=1, b=1, c=2), Diff.A(a=1, b=3, c=1), Diff.A(a=1, b=1, c=1)]
Diff: [Diff.A(a=1, b=3, c=1), Diff.A(a=2, b=1, c=1)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.