简体   繁体   中英

Removing duplicates from list where duplication logic is based on custom field

I have a list of following info

public class TheInfo {
    private int id;
    private String fieldOne;
    private String fieldTwo;
    private String fieldThree;
    private String fieldFour;

   //Standard Getters, Setters, Equals, Hashcode, ToString methods
}

The list is required to be processed in such a way that

  1. Among duplicates, select the one with minimum ID, and remove others. In this particular case, entries are considered duplicate when their values of fieldOne and fieldTwo are equal.
  2. Get concatenated value of fieldThree and fieldFour .

I want to process this list Java8 Streams. Currently I don't know how to remove duplicates base on custom fields. I think I can't use distinct() because I can't change equals/hashcode method as logic is just for this specific case.

How can I achieve this?

Assuming you have

List<TheInfo> list;

you can use

List<TheInfo> result = new ArrayList<>(list.stream().collect(
    Collectors.groupingBy(info -> Arrays.asList(info.getFieldOne(), info.getFieldOne()),
        Collectors.collectingAndThen(
            Collectors.minBy(Comparator.comparingInt(TheInfo::getId)),
            Optional::get))).values());

the groupingBy collector produces groups according to a function whose results determine the equality. A list already implements this for a sequence of values, so Arrays.asList(info.getFieldOne(), info.getFieldOne()) produces a suitable key. In Java 9, you would most probably use List.of(info.getFieldOne(), info.getFieldOne()) instead.

The second argument to groupingBy is another collector determining how to process the groups, Collectors.minBy(…) will fold them to the minimum element according to a comparator and Comparator.comparingInt(TheInfo::getId) is the right comparator for getting the element with the minimum id.

Unfortunately, the minBy collector produces an Optional that would be empty if there are no elements, but since we know that the groups can't be empty (groups without elements wouldn't be created in the first place), we can unconditionally call get on the optional to retrieve the actual value. This is what wrapping this collector in Collectors.collectingAndThen(…, Optional::get) does.

Now, the result of the grouping is a Map mapping from the keys created by the function to the TheInfo instance with the minimum id. Calling values() on the Map gives as a Collection<TheInfo> and since you want a List , a final new ArrayList<>(collection) will produce it.


Thinking about it, this might be one of the cases, where the toMap collector is simpler to use, especially as the merging of the group elements doesn't benefit from mutable reduction:

List<TheInfo> result = new ArrayList<>(list.stream().collect(
    Collectors.toMap(
        info -> Arrays.asList(info.getFieldOne(), info.getFieldOne()),
        Function.identity(),
        BinaryOperator.minBy(Comparator.comparingInt(TheInfo::getId)))).values());

This uses the same function for determining the key and another function determining a single value, which is just an identity function and a reduction function that will be called, if a group has more than one element. This will again be a function returning the minimum according to the ID comparator.

Using streams, you can process it using just the collector, if you provide it with proper classifier:

private static <T> T min(T first, T second, Comparator<? super T> cmp) {
  return cmp.compare(first, second) <= 0 ? first : second;
}

private static void process(Collection<TheInfo> data) {
  Comparator<TheInfo> cmp = Comparator.comparing(info -> info.id);

  data.stream()
      .collect(Collectors.toMap(
                info -> Arrays.asList(info.fieldOne, info.fieldTwo), // Your classifier uses a tuple. Closest thing in JDK currently would be a list or some custom class. I chose List for brevity.
                info -> info, // or Function.identity()
                (a, b) -> min(a, b, cmp) // what do we do with duplicates. Currently we take min according to Comparator.
              ));
}

The above stream will be collected into Map<List<String>, TheInfo> , which will contain minimal element with lists of two strings as key. You can extract the map.values() and return then in new collection or whatever you need them for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM