简体   繁体   中英

How to enhance the performance while dealing with two maps in Java

I have two maps - Map<String, List<String>> input , and another one is Map<String, List<String>> output .

input map

{A=[Apple.txt, Axe.txt, Aid.txt], B=[Ball.txt, Boy.txt,Box.txt], C=[Cow.txt,Cob.txt]}

output map

{A=[Apple.txt, Axe.txt, Aid.txt], B=[Ball.txt, Boy.txt]}

I need to find the missing key-value pair for output map.

 expected output - B= [Box.txt], C=[Cow.txt,Cob.txt]

I need to identify that the output map is missing Box.txt for B key and 'C' key-value pair is missing.

My current approach: I am using one forEach(time complexity O(n) ) and one entry set stream(time complexity: O(m) ) for two maps which cause O(n*m) time complexity.

inputMap.forEach((key,value) ->
    {
    final List<Path> countrifiedFolderList = outputFileMap.entrySet().stream()
            .filter(entry -> entry.getKey().contains(key))
            .filter(files -> !files.getValue().contains(inputFile)).map(Map.Entry::getKey)
            .collect(Collectors.toList());

    if (!countrifiedFolderList.isEmpty())
    {....do processing
    }

I need to enhance the performance issue as the map contains a huge number of data. I need to fetch the result in less than O(n*m) time complexity.

Why not:

map1.keySet().containsAll(map2.keySet());

Update

With one stream:

Map<String, List> result = input.entrySet().stream()
        .filter(entry -> !output.keySet().contains(entry.getKey()) ||
                !output.get(entry.getKey()).containsAll(entry.getValue()))
        .map(entry -> {
                List<String> expected = new ArrayList<>(entry.getValue());
                List<String> current = output.get(entry.getKey());
                expected.removeAll(current != null ? current : List.of());
                return Map.entry(entry.getKey(), expected);
            })
        .collect(Collectors.toMap(Entry::getKey, Entry::getValue));

If you want to measure performance, I would suggest doing a micro-benchmark using your data-structure, sample size, hardware, etc. If you are interest in micro-benchmark, I would suggest using JMH .

If they are TreeMaps, then their keys are already sorted. You could walk both lists together in O(n). Oboe's solution is the best you'll get with HashMaps, and will be O(n*log2(m)).

Few things that could simplify the solution a bit more would be, considering the output map to be a Map<String, Set<String>> and then as the final result being able to treat keys which are present completely in the output map as empty [] .

Map<String, List<String>> lookUpExclusives(Map<String, List<String>> input,
                                                  Map<String, Set<String>> output) {
    return input.entrySet().stream()
            .collect(Collectors.toMap(Map.Entry::getKey,
                    e -> e.getValue().stream()
                            .filter(val -> !output.getOrDefault(e.getKey(),
                                    Collections.emptySet()).contains(val))
                            .collect(Collectors.toList())));
}

This would return {A=[], B=[Box.txt], C=[Cow.txt, Cob.txt]} from the method. In terms of the complexity, this would be M number of times for each element in the value of the entry of the input map and that for each of the N entries, so O(N*M) as well, but that should be the most possible optimization in runtime complexity.

Now that you've had this complex runtime, you can further chain another stream operation to filter entries which do not have any corresponding values left in the result (eg A=[] ). This could be achieved by appending the following code to the above pipeline after the first collect :

.entrySet().stream()
.filter(e -> !e.getValue().isEmpty())
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

It results in the complexity only as O(N*M) + O(N) which could effectively be expressed as O(N*M) only. The advantage over here is that you get the result in the format as you would have expected such as {B=[Box.txt], C=[Cow.txt, Cob.txt]} .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM