Java 中如何在處理兩張地圖時提升性能

Question

我有兩張地圖 - Map<String, List<String>> input ，另一張是Map<String, List<String>> output 。

輸入 map

{A=[Apple.txt, Axe.txt, Aid.txt], B=[Ball.txt, Boy.txt,Box.txt], C=[Cow.txt,Cob.txt]}

output map

{A=[Apple.txt, Axe.txt, Aid.txt], B=[Ball.txt, Boy.txt]}

我需要找到 output map 的缺失鍵值對。

 expected output - B= [Box.txt], C=[Cow.txt,Cob.txt]

我需要確定 output map 缺少 B 鍵的 Box.txt 並且缺少“C”鍵值對。

我目前的方法：我使用一個 forEach（時間復雜度O(n) ）和一個條目集流（時間復雜度： O(m) ）用於兩個導致O(n*m)時間復雜度的地圖。

inputMap.forEach((key,value) ->
    {
    final List<Path> countrifiedFolderList = outputFileMap.entrySet().stream()
            .filter(entry -> entry.getKey().contains(key))
            .filter(files -> !files.getValue().contains(inputFile)).map(Map.Entry::getKey)
            .collect(Collectors.toList());

    if (!countrifiedFolderList.isEmpty())
    {....do processing
    }

我需要增強性能問題，因為 map 包含大量數據。 我需要以小於 O(n*m) 的時間復雜度獲取結果。

Answer 1

為什么不：

map1.keySet().containsAll(map2.keySet());

更新

使用一個 stream：

Map<String, List> result = input.entrySet().stream()
        .filter(entry -> !output.keySet().contains(entry.getKey()) ||
                !output.get(entry.getKey()).containsAll(entry.getValue()))
        .map(entry -> {
                List<String> expected = new ArrayList<>(entry.getValue());
                List<String> current = output.get(entry.getKey());
                expected.removeAll(current != null ? current : List.of());
                return Map.entry(entry.getKey(), expected);
            })
        .collect(Collectors.toMap(Entry::getKey, Entry::getValue));

如果您想測量性能，我建議使用您的數據結構、樣本大小、硬件等進行微基准測試。如果您對微基准測試感興趣，我建議使用JMH 。

Answer 2

如果它們是 TreeMap，那么它們的鍵已經排序。 您可以在 O(n) 中同時遍歷兩個列表。 雙簧管的解決方案是使用 HashMaps 得到的最好的解決方案，並且將是 O(n*log2(m))。

Answer 3

Few things that could simplify the solution a bit more would be, considering the output map to be a Map<String, Set<String>> and then as the final result being able to treat keys which are present completely in the output map as empty [] 。

Map<String, List<String>> lookUpExclusives(Map<String, List<String>> input,
                                                  Map<String, Set<String>> output) {
    return input.entrySet().stream()
            .collect(Collectors.toMap(Map.Entry::getKey,
                    e -> e.getValue().stream()
                            .filter(val -> !output.getOrDefault(e.getKey(),
                                    Collections.emptySet()).contains(val))
                            .collect(Collectors.toList())));
}

這將從方法返回{A=[], B=[Box.txt], C=[Cow.txt, Cob.txt]} 。 就復雜性而言，對於輸入 map 的條目值中的每個元素以及N個條目中的每一個，這將是M次，所以O(N*M)也是如此，但這應該是運行時復雜性的最優化。

現在您已經有了這個復雜的運行時，您可以進一步鏈接另一個 stream 操作來過濾結果中沒有任何對應值的條目（例如A=[] ）。 這可以通過在第一次collect后將以下代碼附加到上述管道來實現：

.entrySet().stream()
.filter(e -> !e.getValue().isEmpty())
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

它導致復雜度僅為O(N*M) + O(N) ，可以有效地表示為O(N*M) 。 此處的優勢是您可以按照預期的格式獲得結果，例如{B=[Box.txt], C=[Cow.txt, Cob.txt]} 。

Java 中如何在處理兩張地圖時提升性能

問題描述

3 個解決方案

解決方案1
0 已采納 2020-07-14 20:56:37

解決方案2
0 2020-07-14 22:15:39

解決方案3
0

Java 中如何在處理兩張地圖時提升性能

問題描述

3 個解決方案

解決方案1 0 已采納 2020-07-14 20:56:37

解決方案2 0 2020-07-14 22:15:39

解決方案3 0

解決方案1
0 已采納 2020-07-14 20:56:37

解決方案2
0 2020-07-14 22:15:39

解決方案3
0