简体   繁体   中英

Merge two collections using streams, but only unique values, and using predicate instead of equals?

I'm trying to merge two collections, but do it conditionally where I only want to add unique values. And what constitutes uniqueness should be decided by a predicate (or similar), not the equals function.

For example, let's assume we have the following two collections of Person objects:

List<Employee> list1 = Arrays.asList(new Employee(1, "Adam", "Smith", Type.CEO), new Employee(2, "Bob", "Jones", Type.OfficeManager), new Employee(3, "Carl", "Lewis", Type.SalesPerson));

List<Employee> list2 = Arrays.asList(new Employee(4, "Xerxes", "Brown", Type.OfficeManager), new Employee(5, "Yuri", "Gagarin", Type.Janitor), new Employee(6, "Zain", "Wilson", Type.SalesPerson));

...and lets assume that I want to merge these lists into a new list by adding elements from both list1 and list2, but excluding elements that have an corresponding "identical" person object already added to the new list, and where the uniqueness is determined by the Type enum (Type.CEO, Type.OfficeManager etc).

Then the expected result, after the merge, is a new list that contains the following persons:

Employee(1, "Adam", "Smith", Type.CEO)
Employee(2, "Bob", "Jones", Type.OfficeManager)
Employee(3, "Carl", "Lewis", Type.SalesPerson)
Employee(5, "Yuri", "Gagarin", Type.Janitor)

What would be the "best" way to achieve this, in a general Java 8/9 way? Ie I don't want to write something that is specific to Person objects or the Type enum, and I don't want to write something that uses the equals method of the objects. Instead I would like to use a BiPredicate or something similar.

But also, I would like not having to perform any looping myself. Streams seems like a good choice, but I can't figure out how to achieve this. How can I write a BiPredicate where one value comes from one stream and the other value comes from another stream, without performing the looping myself?

The reason I want to use an BiPredicate (or similar) is that I want to be able to use this function with advanced logic, where it is not possible to simply extract some property of all the elements, and then group the values based on the uniqueness of this property.

Any suggestions?

/Jimi

Update: To clarify why I talk about a predicate, here is a more complex example:

Lets assume that we have the two collections of Employee objects as before. But this time the uniqueness logic can't be expressed using a mapping function to a specific property of the Employee object. Instead it uses some data in the EmployeeRegistry, like this: if two employees belong to the same tax bracket or if they are of the same "type" then they are considered equal . Because of this OR-logic it is not possible to reduce this to a unique key to use in grouping the data or something like that.

Update2 : For the sake of simplicity, below is a less complex example, but that still is complex enough to not be a simple mapping to a field. It is a bit contrived, but that is for the sake of simplicity.

Lets assume that we have the two collections of Strings. And uniqueness is calculated like this:

  • If two strings are of equal length, they are considered equal
  • Otherwise, if two strings start with the same character, they are considered equal

Using the method Collectors.toMap , as suggested by Federico Peralta Schaffner , seems to work, although I'm not sure how I can write an hashCode() implementation that follows the standard and at the same time is efficient. The only functional implementation I can think of is one that returns a constant value (ie the same value regardless of the string).

Update 3 : Considering that the OR-logic of my "equalness" algorithm breaks the equals contract , and makes it difficult (impossible?) to write an effective hashCode implementation, I am now back where I started. Ie needing something like a predicate of some sort. Here is an updated, "real world" example:

Lets assume that we have the two collections of Employee objects as before, and we want to merge these collections into one. But this time we want to avoid including people that don't get along. To determin if two people get along, we have an HumanRelationsDepartment object, with the method isOkToWorkWithEachother(Person, Person). When two people that don't get along is detected, then only of of them is to be added to the new collection. Which one can be determined by a mapping function, and the default logic could be that the first person is selected.

It is rather trivial to write old school code that solves this problem. What I'm looking for is a loop-free stream based solution. Does such a solution exist? Performance is not an issue.

You can achieve what you want by means of Collectors.toMap :

Collection<Employee> merged = Stream.of(list1, list2)
    .flatMap(Collection::stream)
    .collect(Collectors.toMap(e -> calculateGroup(e), e -> e, (e1, e2) -> e1)))
    .values();

So this creates a Map<SomeGroupType, Employee> , according to some calculateGroup method that receives an Employee instance and returns something that represents the group which the Employee belongs to. This could be some property of the Employee , ie type , or something more complicated that could get data from somewhere else to determine the group, ie tax bracket, as per the employee's annual income. This is for the key of the map, which will determine the uniqueness according to your specific needs. The only requirement of this approach is that whatever class you use for the keys, it must implement equals and hashCode consistently.

The values of the map will be just the Employee instances of the concatenated streams. For the merge function ( Collectors.toMap 3rd argument), I've used (e1, e2) -> e1 , meaning that we'll keep the values already present in the map when there are equal keys. If you want to overwrite values instead, change it to (e1, e2) -> e2 .

// Concatenate the streams
Stream.concat(list1.stream(), list2.stream())
    .collect(
        // Collect similar employees together
        groupingBy(
            // Where "similar" is determined by a function, e.g. Employee::getType
            keyFn,
            // Take the first employee found
            (a, b) -> a)
    // Discard the keys.
    .values();

For a simple merging of the two streams, you can use concat (just update reducer's logic):

Collection<Employee> merged = Stream.concat(list1.stream(), list2.stream())
    .collect(Collectors.groupingBy(emp -> emp.getType(),
                                   Collectors.reducing(null, (e1, e2) -> e1 ) ))
    .values();

For element-wise merging of the 2 collections (assuming same length), you can use an index-based integer stream, to simulate a zipping of the two lists, then use a reducer that merges the two into one.

1 - Ensure lists are sorted by type, as that's what determines uniqueness:

List<Employee> list1Sorted = list1.stream()
       .sorted(Comparator.comparing(Employee::getType))
       .collect(Collectors.toList());

List<Employee> list2Sorted = list2.stream()
       .sorted(Comparator.comparing(Employee::getType))
       .collect(Collectors.toList());

2 - Declare a "reducer" that will merge 2 objects at the same index:

//This is returning an arbitrary value. You may want to add your own logic:
BiFunction<Employee, Employee, Employee> reducer = (e1, e2) -> e1;

3 - Now assume lists have the same length and simulate a zip operation:

List<Employee> mergedList = IntStream.range(0,  list1.size())
    .mapToObj(i -> new Employee[] {list1Sorted.get(i), list2Sorted.get(i)})
    .map(e -> reducer.apply(e[0], e[1]))
    .collect(Collectors.toList());

To simplify it: make a generic zip method:

public static <T> List<T> zipStreams(List<T> list1, List<T> list2, BiFunction<T, T, T> employeeMerger, Comparator<T> sortComparator) {

    if(list1.size() != list2.size()) {
        throw new IllegalArgumentException("Lists must be of the same length");
    }

    List<T> list1Sorted = sortComparator == null ? list1: list1.stream()
                    .sorted(sortComparator)
                    .collect(Collectors.toList()), 
       list2Sorted = sortComparator == null ? list2: list2.stream()
                    .sorted(sortComparator)
                    .collect(Collectors.toList());

    return IntStream.range(0,  list1Sorted.size())
            .mapToObj(i -> Arrays.<T>asList(list1Sorted.get(i), list2Sorted.get(i)))
            .map(list -> employeeMerger.apply(list.get(0), list.get(1)))
            .collect(Collectors.toList());
}

Obviously, this is very specific to merging employee lists element-wise .

Now we can call that with:

zipStreams(list1, list2, (e1, e2) -> e1, Comparator.comparing(Employee::getType));

将它们映射到具有唯一值作为键的映射,然后将条目映射到列表。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM