简体   繁体   English

如何在Java8 Streams中通过多个过滤谓词比较两个Map列表以识别匹配和不匹配的记录

[英]How to compare Two List of Map to identify the matching and non matching records with multiple filter predicates in Java8 Streams

Requirement is to get all the matching and non matching records from the List of Map using multiple matching criteria using the streams.要求是使用流使用多个匹配条件从 Map 列表中获取所有匹配和不匹配的记录。 ie Instead of having a single filter to compare only "Email", required to compare two list for matching records with multiple filter predicate for comparing Email and Id both.即,不是使用单个过滤器来仅比较“电子邮件”,而是需要比较两个列表以匹配记录,并使用多个过滤器谓词来比较 Email 和 Id。

List1:清单 1:

[{"Email","naveen@domain.com", "Id": "A1"}, 
 {"Email":"test@domain.com","id":"A2"}]

List2:列表2:

[{"Email","naveen@domain.com", "Id": "A1"}, 
 {"Email":"test@domain.com","id":"A2"}, 
 {"Email":"test1@domain.com","id":"B1"}]

Using streams I'm able to find the matching and non matching records using Single filter predicate on Email: Matching Records :使用流,我可以在 Email: Matching Records 上使用 Single filter predicate 找到匹配和不匹配的记录:

[{"Email","naveen@domain.com", "Id": "A1"}, 
 {"Email":"test@domain.com","id":"A2"}]

Non Matching Records :非匹配记录:

[{"Email":"test1@domain.com","id":"B1"}]]

Is there a way to compare both Email and Id comparison instead of just Email有没有办法比较电子邮件和 ID 比较而不仅仅是电子邮件

dbRecords.parallelStream().filter(searchData ->
                inputRecords.parallelStream().anyMatch(inputMap ->
                    searchData.get("Email").equals(inputMap.get("Email")))).
                collect(Collectors.toList());

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class ListFiltersToGetMatchingRecords {


    public static void main(String[] args) {

        long startTime = System.currentTimeMillis();
        List<Map<String, Object>> dbRecords = createDbRecords();
        List<Map<String, Object>> inputRecords = createInputRecords();

        List<Map<String,Object>> matchinRecords = dbRecords.parallelStream().filter(searchData ->
                inputRecords.parallelStream().anyMatch(inputMap ->
                    searchData.get("Email").equals(inputMap.get("Email")))).
                collect(Collectors.toList());

        List<Map<String,Object>> notMatchinRecords = inputRecords.parallelStream().filter(searchData ->
                dbRecords.parallelStream().noneMatch( inputMap ->
                        searchData.get("Email").equals(inputMap.get("Email"))
                )).collect(Collectors.toList());

        long endTime = System.currentTimeMillis();
        System.out.println("Matching Records: " + matchinRecords.size());
        matchinRecords.forEach(record -> {
            System.out.println(record.get("Email"));
        });

        System.out.println("Non Matching Records" + notMatchinRecords.size());
        notMatchinRecords.forEach(record -> {
            System.out.println(record.get("Email"));
        });
        System.out.println("Non Matching Records" + notMatchinRecords.size());
        System.out.println("Matching Records: " + matchinRecords.size());
        System.out.println("TotalTImeTaken =" + ((endTime-startTime) /1000) + "sec");
    }

    private static List<Map<String, Object>> createDbRecords() {
        List<Map<String, Object>> dbRecords = new ArrayList<>();
        for(int i =0; i< 100; i+=2) {
            Map<String, Object> dbRecord = new HashMap<>();
            dbRecord.put("Email","naveen" + i +"@gmail.com");
            dbRecord.put("Id", "ID" + i);
            dbRecords.add(dbRecord);
        }
        return dbRecords;
    }

    private static List<Map<String, Object>> createInputRecords() {
        List<Map<String, Object>> dbRecords = new ArrayList<>();
        for(int i =0; i< 100; i++) {
            Map<String, Object> dbRecord = new HashMap<>();
            dbRecord.put("Email", "naveen" + i +"@gmail.com");
            dbRecord.put("ID", "ID" + i);
            dbRecords.add(dbRecord);
        }
        return dbRecords;
    }
}

If you care for performance, you should not combine a linear search with another linear search;如果您关心性能,则不应将线性搜索与另一个线性搜索结合起来; with the resulting time complexity can't be fixed with parallel processing when the lists get large.当列表变大时,无法通过并行处理修复由此产生的时间复杂度。

You should built a data structure which allows efficient lookups first:您应该首先构建一个允许高效查找的数据结构:

Map<List<?>,Map<String, Object>> inputKeys = inputRecords.stream()
    .collect(Collectors.toMap(
        m -> Arrays.asList(m.get("ID"),m.get("Email")),
        m -> m,
        (a,b) -> { throw new IllegalStateException("duplicate "+a+" and "+b); },
        LinkedHashMap::new));

List<Map<String,Object>> matchinRecords = dbRecords.stream()
    .filter(m -> inputKeys.containsKey(Arrays.asList(m.get("ID"),m.get("Email"))))
    .collect(Collectors.toList());

matchinRecords.forEach(m -> inputKeys.remove(Arrays.asList(m.get("ID"),m.get("Email"))));
List<Map<String,Object>> notMatchinRecords = new ArrayList<>(inputKeys.values());

This solution will keep the identity of the Map s.此解决方案将保留Map的身份。

If you are only interested in the values associated with the "Email" key, it would be much simpler:如果您只对与"Email"键关联的值感兴趣,那就简单多了:

Map<Object,Object> notMatchinRecords = inputRecords.stream()
    .collect(Collectors.toMap(
        m -> m.get("ID"),
        m -> m.get("Email"),
        (a,b) -> { throw new IllegalStateException("duplicate"); },
        LinkedHashMap::new));

Object notPresent = new Object();
Map<Object,Object> matchinRecords = dbRecords.stream()
    .filter(m -> notMatchinRecords.getOrDefault(m.get("ID"), notPresent)
                                  .equals(m.get("Email")))
    .collect(Collectors.toMap(
        m -> m.get("ID"),
        m -> m.get("Email"),
        (a,b) -> { throw new IllegalStateException("duplicate"); },
        LinkedHashMap::new));

notMatchinRecords.keySet().removeAll(matchinRecords.keySet());

System.out.println("Matching Records: " + matchinRecords.size());
matchinRecords.forEach((id,email) -> System.out.println(email));

System.out.println("Non Matching Records" + notMatchinRecords.size());
notMatchinRecords.forEach((id,email) -> System.out.println(email));

The first variant can get extended to support more/other map entries easily:第一个变体可以扩展以轻松支持更多/其他地图条目:

List<String> keys = Arrays.asList("ID", "Email");

Function<Map<String,Object>,List<?>> getKey
    = m -> keys.stream().map(m::get).collect(Collectors.toList());

Map<List<?>,Map<String, Object>> inputKeys = inputRecords.stream()
    .collect(Collectors.toMap(
        getKey,
        m -> m,
        (a,b) -> { throw new IllegalStateException("duplicate "+a+" and "+b); },
        LinkedHashMap::new));

List<Map<String,Object>> matchinRecords = dbRecords.stream()
    .filter(m -> inputKeys.containsKey(getKey.apply(m)))
    .collect(Collectors.toList());

matchinRecords.forEach(m -> inputKeys.remove(getKey.apply(m)));
List<Map<String,Object>> notMatchinRecords = new ArrayList<>(inputKeys.values());

You just need to add a condition in the comparison您只需要在比较中添加一个条件

dbRecords.parallelStream().filter(searchData -> 
                  inputRecords.parallelStream().anyMatch(inputMap ->
                                     searchData.get("Email").equals(inputMap.get("Email"))
                                     && searchData.get("id").equals(inputMap.get("id"))))
         .collect(Collectors.toList());

  • Add the same in the noneMatch() .noneMatch()添加相同的noneMatch()
  • Compute the average time using System.nanoTime() , it's more accurate使用System.nanoTime()计算平均时间,更准确
  • Try with and without .parallelStream() (just .stream() ) because not sure it helps you)尝试使用和不使用.parallelStream() (只是.stream() )因为不确定它对你有帮助)

Here it is mate...这是伙计...

The most efficient way to compare two List of Map to identify the matching and non matching records with multiple filter predicates in Java8 Streams is:在 Java8 Streams 中比较两个 Map 列表以识别具有多个过滤谓词的匹配和非匹配记录的最有效方法是:

List<Map<String,String>> unMatchedRecords = dbRecords.parallelStream().filter(searchData ->
                inputRecords.parallelStream().noneMatch( inputMap ->
                        searchData.entrySet().stream().noneMatch(value ->
                                inputMap.entrySet().stream().noneMatch(value1 ->
                                        (value1.getKey().equals(value.getKey()) &&
                                                value1.getValue().equals(value.getValue()))))
                )).collect(Collectors.toList());

Note:笔记:

  1. If <Map<String,String> used above is <Map<Object,Object> instead, don't forget to apply .toString() for .getKey() and value.getKey().如果上面使用的 <Map<String,String> 是 <Map<Object,Object>,不要忘记为 .getKey() 和 value.getKey() 应用 .toString()。

  2. The unmatched records thus obtained, could be easily subtracted from either of the list (ie, dbRecords or inputRecords) to retrieve the match results and the operation is swift.如此获得的不匹配记录可以很容易地从列表(即,dbRecords 或 inputRecords)中减去以检索匹配结果,并且操作很快。

Cheers,干杯,

Shubham Chauhan舒巴姆·乔汉

Why not use && inside anyMatch :为什么不在anyMatch使用&&

anyMatch(inputMap -> searchData.get("Email").equals(inputMap.get("Email")) 
                     && searchData.get("Id").equals(inputMap.get("Id")))

And I doubt you actually need parallelStream , you do need System.nanoTime on the other hand instead of currentTimeMillis而且我怀疑您是否真的需要parallelStream ,另一方面您确实需要System.nanoTime而不是currentTimeMillis

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM