简体   繁体   English

Java 中是否有一种简单的方法可以使用自定义等于 function 来获取两个 collections 之间的差异而不覆盖等于?

[英]Is there a simple way in Java to get the difference between two collections using a custom equals function without overriding the equals?

I'm open to use a lib.我愿意使用一个库。 I just want something simple to diff two collections on a different criteria than the normal equals function.我只是想要一些简单的东西来区分两个 collections,而不是正常的标准等于 function。

Right now I use something like:现在我使用类似的东西:

collection1.stream()
           .filter(element -> !collection2.stream()
                                          .anyMatch(element2 -> element2.equalsWithoutSomeField(element)))
           .collect(Collectors.toSet());

and I would like something like:我想要类似的东西:

Collections.diff(collection1, collection2, Foo::equalsWithoutSomeField);

(edit) More context: (编辑)更多背景:

Should of mentioned that I'm looking for something that exists already and not to code it myself.应该提到我正在寻找已经存在的东西,而不是自己编写代码。 I might code a small utils from your ideas if nothing exists.如果什么都不存在,我可能会根据您的想法编写一个小实用程序。

Also, Real duplicates aren't possible in my case: the collections are Sets.此外,在我的情况下,不可能有真正的重复:collections 是集合。 However, duplicates according to the custom equals are possible and should not be removed by this operation.但是,根据自定义等号重复是可能的,不应通过此操作删除。 It seems to be a limitation in a lot of possible solutions.在许多可能的解决方案中,这似乎是一个限制。

We use similar methods in our project to shorten repetitive collection filtering. 我们在项目中使用类似的方法来缩短重复的集合过滤。 We started with some basic building blocks: 我们从一些基本构建块开始:

static <T> boolean anyMatch(Collection<T> set, Predicate<T> match) {
    for (T object : set)
        if (match.test(object))
            return true;
    return false;
}

Based on this, we can easily implement methods like noneMatch and more complicated ones like isSubset or your diff : 基于此,我们可以轻松实现像noneMatch这样的方法和更复杂的方法,如isSubset或你的diff

static <E> Collection<E> disjunctiveUnion(Collection<E> c1, Collection<E> c2, BiPredicate<E, E> match)
{
    ArrayList<E> diff = new ArrayList<>();
    diff.addAll(c1);
    diff.addAll(c2);
    diff.removeIf(e -> anyMatch(c1, e1 -> match.test(e, e1)) 
                       && anyMatch(c2, e2 -> match.test(e, e2)));
    return diff;
}

Note that there are for sure some possibilities for perfomance tuning. 请注意,性能调整肯定有一些可能性。 But keeping it separated into small methods help understanding and using them with ease. 但是将它分成小方法有助于理解和轻松地使用它们。 Used in code they read quite nice. 在代码中使用,他们阅读相当不错。

You would then use it as you already said: 然后你会按照你已经说过的那样使用它:

CollectionUtils.disjunctiveUnion(collection1, collection2, Foo::equalsWithoutSomeField);

Taking Jose Da Silva's suggestion into account, you could even use Comparator to build your criteria on the fly: 考虑到Jose Da Silva的建议,你甚至可以使用Comparator来动态建立你的标准:

Comparator<E> special = Comparator.comparing(Foo::thisField)
                                  .thenComparing(Foo::thatField);
BiPredicate specialMatch = (e1, e2) -> special.compare(e1, e2) == 0;

You can use UnifiedSetWithHashingStrategy from Eclipse Collections . 您可以使用Eclipse Collections中的 UnifiedSetWithHashingStrategy UnifiedSetWithHashingStrategy allows you to create a Set with a custom HashingStrategy . UnifiedSetWithHashingStrategy允许您使用自定义HashingStrategy创建Set。 HashingStrategy allows the user to use a custom hashCode() and equals() . HashingStrategy允许用户使用自定义hashCode()equals() The Object's hashCode() and equals() is not used. 不使用Object的hashCode()equals()

Edit based on requirement from OP via comment : 根据OP的要求通过评论进行编辑

You can use reject() or removeIf() depending on your requirement. 您可以根据需要使用reject()removeIf()

Code Example: 代码示例:

// Common code
Person person1 = new Person("A", "A");
Person person2 = new Person("B", "B");
Person person3 = new Person("C", "A");
Person person4 = new Person("A", "D");
Person person5 = new Person("E", "E");

MutableSet<Person> personSet1 = Sets.mutable.with(person1, person2, person3);
MutableSet<Person> personSet2 = Sets.mutable.with(person2, person4, person5);

HashingStrategy<Person> hashingStrategy =
    HashingStrategies.fromFunction(Person::getLastName);

1) Using reject() : Creates a new Set which contains all the elements which do not satisfy the Predicate . 1)使用reject()创建新的Set包含所有这些不满足元素Predicate

@Test
public void reject()
{
    MutableSet<Person> personHashingStrategySet = HashingStrategySets.mutable.withAll(
        hashingStrategy, personSet2);

    // reject creates a new copy
    MutableSet<Person> rejectSet = personSet1.reject(personHashingStrategySet::contains);
    Assert.assertEquals(Sets.mutable.with(person1, person3), rejectSet);
}

2) Using removeIf() : Mutates the original Set by removing the elements which satisfy the Predicate . 2)使用removeIf() :通过删除满足Predicate的元素来改变原始Set

@Test
public void removeIfTest()
{
    MutableSet<Person> personHashingStrategySet = HashingStrategySets.mutable.withAll(
        hashingStrategy, personSet2);

    // removeIf mutates the personSet1
    personSet1.removeIf(personHashingStrategySet::contains);
    Assert.assertEquals(Sets.mutable.with(person1, person3), personSet1);
}

Answer before requirement from OP via comment: Kept for reference if others might find it useful. 通过评论在OP要求之前回答:如果其他人可能认为它有用,请保留以供参考。

3) Using Sets.differenceInto() API available in Eclipse Collections: 3)使用Eclipse集合中可用的Sets.differenceInto() API:

In the code below, set1 and set2 are the two sets which use Person 's equals() and hashCode() . 在下面的代码中, set1set2是使用Personequals()hashCode()的两个集合。 The differenceSet is a UnifiedSetWithHashingStrategy so, it uses the lastNameHashingStrategy to define uniqueness. differenceSetUnifiedSetWithHashingStrategy因此,它使用lastNameHashingStrategy来定义唯一性。 Hence, even though set2 does not contain person3 however it has the same lastName as person1 the differenceSet contains only person1 . 因此,即使set2不包含person3但它具有相同的lastName作为person1differenceSet只包含person1

@Test
public void differenceTest()
{
    MutableSet<Person> differenceSet = Sets.differenceInto(
        HashingStrategySets.mutable.with(hashingStrategy), 
        set1, 
        set2);

    Assert.assertEquals(Sets.mutable.with(person1), differenceSet);
}

Person class common to both code blocks: 两个代码块共有的Person类:

public class Person
{
    private final String firstName;
    private final String lastName;

    public Person(String firstName, String lastName)
    {
        this.firstName = firstName;
        this.lastName = lastName;
    }

    public String getFirstName()
    {
        return firstName;
    }

    public String getLastName()
    {
        return lastName;
    }

    @Override
    public boolean equals(Object o)
    {
        if (this == o)
        {
            return true;
        }
        if (o == null || getClass() != o.getClass())
        {
            return false;
        }
        Person person = (Person) o;
        return Objects.equals(firstName, person.firstName) &&
                Objects.equals(lastName, person.lastName);
    }

    @Override
    public int hashCode()
    {
        return Objects.hash(firstName, lastName);
    }
}

Javadocs: MutableSet , UnifiedSet , UnifiedSetWithHashingStrategy , HashingStrategy , Sets , reject , removeIf Javadocs: MutableSetUnifiedSetUnifiedSetWithHashingStrategyHashingStrategySetsrejectremoveIf

Note: I am a committer on Eclipse Collections 注意:我是Eclipse Collections的提交者

Comparing 对比

You can achieve this without the use of any library, just using java's Comparator 你可以在不使用任何库的情况下实现这一点,只需使用java的Comparator即可

For instance, with the following object 例如,使用以下对象

public class A {
    private String a;
    private Double b;
    private String c;
    private int d;
    // getters and setters
}

You can use a comparator like 你可以使用像这样的比较器

Comparator<AA> comparator = Comparator.comparing(AA::getA)
        .thenComparing(AA::getB)
        .thenComparingInt(AA::getD);

This compares the fields a , b and the int d , skipping c . 这比较了字段ab和int d ,跳过c

The only problem here is that this won't work with null values. 这里唯一的问题是这不适用于空值。


Comparing nulls 比较空值

One possible solution to do a fine grained configuration, that is allow to check for specific null fields is using a Comparator class similar to: 进行细粒度配置的一种可能的解决方案是允许检查特定的空字段,使用的Comparator类类似于:

// Comparator for properties only, only writed to be used with Comparator#comparing
public final class PropertyNullComparator<T extends Comparable<? super T>> 
                                            implements Comparator<Object> {
    private PropertyNullComparator() {  }
    public static <T extends Comparable<? super T>> PropertyNullComparator<T> of() {
        return new PropertyNullComparator<>();
    }
    @Override
    public int compare(Object o1, Object o2) {
        if (o1 != null && o2 != null) {
            if (o1 instanceof Comparable) {
                @SuppressWarnings({ "unchecked" })
                Comparable<Object> comparable = (Comparable<Object>) o1;
                return comparable.compareTo(o2);
            } else {
                // this will throw a ccn exception when object is not comparable
                @SuppressWarnings({ "unchecked" })
                Comparable<Object> comparable = (Comparable<Object>) o2;
                return comparable.compareTo(o1) * -1; // * -1 to keep order
            }
        } else {
            return o1 == o2 ? 0 : (o1 == null ? -1 : 1); // nulls first
        }
    }
}

This way you can use a comparator specifying the allowed null fields. 这样,您可以使用指定允许的空字段的比较器。

Comparator<AA> comparator = Comparator.comparing(AA::getA)
        .thenComparing(AA::getB, PropertyNullComparator.of())
        .thenComparingInt(AA::getD);

If you don't want to define a custom comparator you can use something like: 如果您不想定义自定义比较器,可以使用以下内容:

Comparator<AA> comparator = Comparator.comparing(AA::getA)
        .thenComparing(AA::getB, Comparator.nullsFirst(Comparator.naturalOrder()))
        .thenComparingInt(AA::getD);

Difference method 差异法

The difference (A - B) method could be implemented using two TreeSets . 可以使用两个TreeSets实现差异(A-B)方法。

static <T> TreeSet<T> difference(Collection<T> c1, 
                                 Collection<T> c2, 
                                 Comparator<T> comparator) {
    TreeSet<T> treeSet1 = new TreeSet<>(comparator); treeSet1.addAll(c1);
    if (treeSet1.size() > c2.size()) {
        treeSet1.removeAll(c2);
    } else {
        TreeSet<T> treeSet2 = new TreeSet<>(comparator); treeSet2.addAll(c2);
        treeSet1.removeAll(treeSet2);
    }
    return treeSet1;
}

note: a TreeSet makes sense to be used since we are talking of uniqueness with a specific comparator. 注意: TreeSet是有意义的,因为我们正在谈论与特定比较器的唯一性。 Also could perform better, the contains method of TreeSet is O(log(n)) , compared to a common ArrayList that is O(n) . 也可以执行得更好, TreeSetcontains方法是O(log(n)) ,与常见的ArrayListO(n)

Why only a TreeSet is used when treeSet1.size() > c2.size() , this is because when the condition is not met, the TreeSet#removeAll , uses the contains method of the second collection, this second collection could be any java collection and its contains method its not guaranteed to work exactly the same as the contains of the first TreeSet (with custom comparator). 为什么在treeSet1.size() > c2.size()时只使用TreeSet ,这是因为当条件不满足时, TreeSet#removeAll ,使用第二个集合的contains方法,这个第二个集合可以是任何java集合及其contains方法不能保证与第一个TreeSetcontains完全相同(使用自定义比较器)。


Edit (Given the more context of the question) 编辑(考虑到问题的更多背景)

Since collection1 is a set that could contains repeated elements acording to the custom equals (not the equals of the object) the solution already provided in the question could be used, since it does exactly that, without modifying any of the input collections and creating a new output set. 由于collection1是一个可以包含自定义equals (而不是对象的equals )的重复元素的集合,因此可以使用问题中已经提供的解决方案,因为它确实可以使用,而无需修改任何输入集合并创建新的输出集。

So you can create your own static function (because at least i am not aware of a library that provides a similar method), and use the Comparator or a BiPredicate . 因此,您可以创建自己的静态函数(因为至少我不知道提供类似方法的库),并使用ComparatorBiPredicate

static <T> Set<T> difference(Collection<T> collection1, 
                             Collection<T> collection2, 
                             Comparator<T> comparator) {
    collection1.stream()
            .filter(element1 -> !collection2.stream()
                    .anyMatch(element2 -> comparator.compare(element1, element2) == 0))
            .collect(Collectors.toSet());
}

Edit (To Eugene) 编辑(到尤金)

"Why would you want to implement a null safe comparator yourself" “为什么你要自己实现一个null安全比较器”

At least to my knowledge there isn't a comparator to compare fields when this are a simple and common null, the closest that i know of is (to raplace my sugested PropertyNullComparator.of() [clearer/shorter/better name can be used]): 至少据我所知,没有一个比较器来比较字段时,这是一个简单的常见null,我知道的最接近的是(raplace我的sugested PropertyNullComparator.of() [更清晰/更短/更好的名称可以使用]):

Comparator.nullsFirst(Comparator.naturalOrder())

So you would have to write that line for every field that you want to compare. 因此,您必须为要比较的每个字段编写该行。 Is this doable?, of course it is, is it practical?, i think not. 这是可行的吗?当然是,它是否实用?,我想不是。

Easy solution, create a helper method. 轻松解决方案,创建一个帮助方法。

static class  ComparatorUtils {
    public static <T extends Comparable<? super T>> Comparator<T> shnp() { // super short null comparator
        return Comparator.nullsFirst(Comparator.<T>naturalOrder());
    }
}

Do this work?, yes this works, is it practical?, it looks like, is it a great solution? 这项工作吗?是的,这是有效的,它是否实用?看起来,这是一个很好的解决方案吗? well that depends, many people consider the exaggerated (and/or unnecessary) use of helper methods as an anti-pattern, (a good old article by Nick Malik ). 这取决于许多人认为使用辅助方法作为反模式的夸大(和/或不必要),( 尼克马利克的一篇好文章)。 There are some reasons listed there, but to make things short, this is an OO language, so OO solutions are normally preferred to static helper methods. 这里列出了一些原因,但为了简化,这是一种OO语言,因此OO解决方案通常比静态辅助方法更受欢迎。


"As stated in the documentation : Note that the ordering maintained by a set (whether or not an explicit comparator is provided must be consistent with equals if it is to correctly implement the Set interface. Further, the same problem would arise in the other case, when size() > c.size() because ultimately this would still call equals in the remove method. So they both have to implement Comparator and equals consistently for this to work correctly" “如文档中所述:请注意,由集合维护的顺序(无论是否提供显式比较器,如果要正确实现Set接口,必须与equals一致。此外,在另一种情况下会出现同样的问题,当size()> c.size()时,因为最终这仍然会在remove方法中调用equals。所以他们都必须实现Comparator并且一致地等于此才能正常工作“

The javadoc says of TreeSet the following, but with a clear if: javadoc说TreeSet如下,但有一个明确的if:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface 请注意, 如果要正确实现Set接口,则由set维护的排序(无论是否提供显式比较器)必须与equals一致。

Then says this: 然后这说:

See Comparable or Comparator for a precise definition of consistent with equals 有关与equals一致的精确定义,请参见Comparable或Comparator

If you go to the Comparable javadoc says: 如果你去比较 javadoc说:

It is strongly recommended (though not required) that natural orderings be consistent with equals 强烈建议(尽管不要求)自然排序与equals一致

If we continue to read the javadoc again from Comparable (even in the same paragraph) says the following: 如果我们继续从Comparable中读取javadoc(即使在同一段中),请说明以下内容:

This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all key comparisons using its compareTo (or compare ) method, so two keys that are deemed equal by this method are, from the standpoint of the set, equal. 这是因为Set接口是根据equals操作定义的,但是TreeSet实例使用compareTo(或compare)方法执行所有键比较,因此从这个方法看,两个被认为相等的键是设定,平等。 The behavior of a set is well-defined even if its ordering is inconsistent with equals; 集合的行为即使其排序与equals不一致也是明确定义的; it just fails to obey the general contract of the Set interface. 它只是不遵守Set接口的一般合同。

By this last quote and with a very simple code debug, or even a reading, you can see the use of an internal TreeMap , and that all its derivated methods are based on the comparator , not the equals method; 通过这个最后的引用和一个非常简单的代码调试,甚至是一个阅读,你可以看到内部TreeMap的使用,并且它的所有派生方法都是基于comparator ,而不是equals方法;


"Why is this so implemented? because there is a difference when removing many elements from a little set and the other way around, as a matter of fact same stands for addAll" “为什么这样实现?因为从一个小集合中移除许多元素时存在差异,反之亦然,事实上同样代表addAll”

If you go to the definition of removeAll you can see that its implementation is in AbstractSet , it is not overrided. 如果你去removeAll的定义你可以看到它的实现是在AbstractSet ,它没有被覆盖。 And this implementation uses a contains from the argument collection when this is larger, the beavior of this contains is uncertain, it isn't necessary (nor probable) that the received collection (eg list, queue, etc) has/can define the same comparator. 并且这个实现使用来自参数集合的contains ,当它更大时, contains是不确定的,所接收的集合(例如列表,队列等)没有必要(也可能)定义相同的比较。

Update 1: This jdk bug is being discussed (and considerated to be fixed) in here https://bugs.openjdk.java.net/browse/JDK-6394757 更新1:这个jdk错误正在讨论(并考虑修复)在这里https://bugs.openjdk.java.net/browse/JDK-6394757

static <T> Collection<T> diff(Collection<T> minuend, Collection<T> subtrahend, BiPredicate<T, T> equals) {
    Set<Wrapper<T>> w1 = minuend.stream().map(item -> new Wrapper<>(item, equals)).collect(Collectors.toSet());
    Set<Wrapper<T>> w2 = subtrahend.stream().map(item -> new Wrapper<>(item, equals)).collect(Collectors.toSet());
    w1.removeAll(w2);
    return w1.stream().map(w -> w.item).collect(Collectors.toList());
}

static class Wrapper<T> {
    T item;
    BiPredicate<T, T> equals;

    Wrapper(T item, BiPredicate<T, T> equals) {
        this.item = item;
        this.equals = equals;
    }

    @Override
    public int hashCode() {
        // all items have same hash code, check equals
        return 1;
    }

    @Override
    public boolean equals(Object that) {
        return equals.test(this.item, ((Wrapper<T>) that).item);
    }
}

pom.xml: pom.xml:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-collections4</artifactId>
    <version>4.4</version>
</dependency>

code/test:代码/测试:

package com.my;

import lombok.Builder;
import lombok.Getter;
import lombok.ToString;
import org.apache.commons.collections4.CollectionUtils;
import org.apache.commons.collections4.Equator;

import java.util.Collection;
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;
import java.util.function.Function;

public class Diff {

    public static class FieldEquator<T> implements Equator<T> {
        private final Function<T, Object>[] functions;

        @SafeVarargs
        public FieldEquator(Function<T, Object>... functions) {
            if (Objects.isNull(functions) || functions.length < 1) {
                throw new UnsupportedOperationException();
            }
            this.functions = functions;
        }

        @Override
        public boolean equate(T o1, T o2) {
            if (Objects.isNull(o1) && Objects.isNull(o2)) {
                return true;
            }
            if (Objects.isNull(o1) || Objects.isNull(o2)) {
                return false;
            }
            for (Function<T, ?> function : functions) {
                if (!Objects.equals(function.apply(o1), function.apply(o2))) {
                    return false;
                }
            }
            return true;
        }

        @Override
        public int hash(T o) {
            if (Objects.isNull(o)) {
                return -1;
            }
            int i = 0;
            Object[] vals = new Object[functions.length];
            for (Function<T, Object> function : functions) {
                vals[i] = function.apply(o);
                i++;
            }
            return Objects.hash(vals);
        }
    }

    @SafeVarargs
    private static <T> Set<T> difference(Collection<T> a, Collection<T> b, Function<T, Object>... functions) {
        if ((Objects.isNull(a) || a.isEmpty()) && Objects.nonNull(b) && !b.isEmpty()) {
            return new HashSet<>(b);
        } else if ((Objects.isNull(b) || b.isEmpty()) && Objects.nonNull(a) && !a.isEmpty()) {
            return new HashSet<>(a);
        }

        Equator<T> eq = new FieldEquator<>(functions);

        Collection<T> res = CollectionUtils.removeAll(a, b, eq);
        res.addAll(CollectionUtils.removeAll(b, a, eq));

        return new HashSet<>(res);
    }

    /**
     * Test
     */

    @Builder
    @Getter
    @ToString
    public static class A {
        String a;
        String b;
        String c;
    }

    public static void main(String[] args) {
        Set<A> as1 = new HashSet<>();
        Set<A> as2 = new HashSet<>();

        A a1 = A.builder().a("1").b("1").c("1").build();
        A a2 = A.builder().a("1").b("1").c("2").build();
        A a3 = A.builder().a("2").b("1").c("1").build();
        A a4 = A.builder().a("1").b("3").c("1").build();
        A a5 = A.builder().a("1").b("1").c("1").build();
        A a6 = A.builder().a("1").b("1").c("2").build();
        A a7 = A.builder().a("1").b("1").c("6").build();

        as1.add(a1);
        as1.add(a2);
        as1.add(a3);

        as2.add(a4);
        as2.add(a5);
        as2.add(a6);
        as2.add(a7);

        System.out.println("Set1: " + as1);
        System.out.println("Set2: " + as2);

        // Check A::getA, A::getB ignore A::getC
        Collection<A> difference = difference(as1, as2, A::getA, A::getB);

        System.out.println("Diff: " + difference);
    }
}

result:结果:

Set1: [Diff.A(a=2, b=1, c=1), Diff.A(a=1, b=1, c=1), Diff.A(a=1, b=1, c=2)]
Set2: [Diff.A(a=1, b=1, c=6), Diff.A(a=1, b=1, c=2), Diff.A(a=1, b=3, c=1), Diff.A(a=1, b=1, c=1)]
Diff: [Diff.A(a=1, b=3, c=1), Diff.A(a=2, b=1, c=1)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM