使用Java 8合并，排序和限制Map流

Question

I have two maps Map<String, Long> . 我有两个地图Map<String, Long> 。 I want to merge both maps, sort in descending order, and get top 5. In case of duplicate keys in merge I need to sum the values. 我想合并两个映射，按降序排序，并获得前5个。如果合并中有重复键，我需要对值求和。 I have the following code that works: 我有以下代码：

Map<String, Long> topFive = (Stream.concat(map1.entrySet().stream(), 
                                           map2.entrySet().stream())
                                   .collect(Collectors.toMap(Map.Entry::getKey, 
                                                             Map.Entry::getValue,
                                                             Long::sum)))
                                   .entrySet()
                                   .stream()
                                   .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
                                   .limit(5)
                                   .collect(Collectors.toMap(Map.Entry::getKey,
                                                             Map.Entry::getValue,
                                                            (v1, v2) -> v1,
                                                            LinkedHashMap::new));

But I would like to know if there is a better solution. 但我想知道是否有更好的解决方案。

Answer 1

If you mean better in terms of performance , and you have large collections, and only need few top elements you can avoid sorting the entire map, given the n*log(n) complexity. 如果您的性能更好，并且您拥有大型集合，并且只需要很少的顶级元素，则可以避免对整个映射进行排序，因为n*log(n)复杂性。

If you already have Guava, you can use MinMaxPriorityQueue to store only the best N results. 如果您已经拥有Guava，则可以使用MinMaxPriorityQueue仅存储最佳N个结果。 And then just sort this few constant N elements. 然后只需对这几个常数N元素进行排序。

Comparator<Entry<String, Long>> comparator = Entry.comparingByValue(reverseOrder());

Map<String, Long> merged = Stream.of(map1, map2)
        .map(Map::entrySet)
        .flatMap(Set::stream)
        .collect(Collectors.toMap(Map.Entry::getKey, 
                Map.Entry::getValue, 
                Long::sum));

MinMaxPriorityQueue<Entry<String, Long>> tops = MinMaxPriorityQueue.orderedBy(comparator)
        .maximumSize(5)
        .create(merged.entrySet());

Map<String, Long> sorted = tops.stream()
        .sorted(comparator)
        .collect(Collectors.toMap(Map.Entry::getKey, 
                Map.Entry::getValue,
                (m1, m2) -> m1,
                LinkedHashMap::new));

If you don't have/want to use Guava, you can simulate the MinMaxPriorityQueue by using a custom TreeMap (Also a class that receives the max size in constructor can be created, if you don't want to use an anonymous class [this is to show the functionality]). 如果您没有/想要使用Guava，可以使用自定义TreeMap来模拟MinMaxPriorityQueue （如果您不想使用匿名类，也可以创建在构造函数中接收最大大小的类[this是显示功能]）。

Set<Entry<String, Long>> sorted = new TreeSet<Entry<String, Long>>(comparator) {
    @Override
    public boolean add(Entry<String, Long> entry) {
        if (size() < 5) { // 5 can be constructor arg in custom class
            return super.add(entry);
        } else if (comparator().compare(last(), entry) > 0) {
            remove(last());
            return super.add(entry);
        } else {
            return false;
        }
    }
};

And add all the elements to the set with top. 并使用top将所有元素添加到集合中。

sorted.addAll(merged);

You can also change the merge function, to use something similar to the merge mentioned by Federico. 您还可以更改合并功能，以使用与Federico提到的合并类似的功能。

Map<String, Long> merged = new HashMap<>(map1);
map2.forEach((k, v) -> merged.merge(k, v, Long::sum));

This tends to be faster that using streams, and after that, once you have the merged map, you can select the top N elements with MinMaxPriorityQueue or TreeSet , avoiding again the unnecessary need of sorting the entire collection. 这通常比使用流更快，在此之后，一旦您拥有合并的地图，您可以使用MinMaxPriorityQueue或TreeSet选择前N个元素， MinMaxPriorityQueue避免再次排序整个集合的不必要的需要。

Answer 2

A better solution might be to use an accumulator that keeps the top 5, rather than sorting the whole stream. 一个更好的解决方案可能是使用一个保持前5个的累加器，而不是整个流。 Now you're doing an estimated n * log(n) comparisons instead of something between n and n * log(5). 现在你正在进行估计的n * log（n）比较，而不是n和n * log（5）之间的比较。

Answer 3

I would focus on making the code easier to read: 我将专注于使代码更容易阅读：

// Merge
Map<String, Long> merged = new HashMap<>(map1);
map2.forEach((k, v) -> merged.merge(k, v, Long::sum));

// Sort descending
List<Map.Entry<String, Long>> list = new ArrayList<>(merged.entrySet());
list.sort(Map.Entry.comparingByValue(Comparator.reverseOrder()));

// Select top entries
Map<String, Long> top5 = new LinkedHashMap<>();
list.subList(0, Math.min(5, list.size()))
    .forEach(e -> e.put(e.getKey(), e.getValue()));

Also, by not using streams, this solution will surely have better performance. 此外，通过不使用流，此解决方案肯定会有更好的性能。

Answer 4

Just adding another solution using a Collector . 只需使用Collector添加另一个解决方案。 It uses a TreeSet as the intermediate accumulation type, converting the set to a map with the finisher. 它使用TreeSet作为中间累积类型，使用整理器将集合转换为地图。

private <K, V, E extends Map.Entry<K,V>> Collector<E, TreeSet<E>, Map<K,V>> 
        toMap(BinaryOperator<V> mergeFunction, Comparator<E> comparator, int limit) {
    Objects.requireNonNull(mergeFunction);
    Objects.requireNonNull(comparator);

    Supplier<TreeSet<E>> supplier = () -> new TreeSet<>(comparator);
    BiConsumer<TreeSet<E>, E> accumulator = (set, entry) -> accumulate(set, entry, mergeFunction);
    BinaryOperator<TreeSet<E>> combiner = (destination, source) -> {
            source.forEach(e -> accumulator.accept(destination, e)); return destination; };
    Function<TreeSet<E>, Map<K,V>> finisher = s -> s.stream()
            .limit(limit)
            .collect(Collectors.toMap(E::getKey, E::getValue, (v1, v2) -> v1, LinkedHashMap::new));

    return Collector.of(supplier, accumulator, combiner, finisher);
}

private <K, V, E extends Map.Entry<K,V>> void accumulate(
        TreeSet<E> set, E newEntry, BinaryOperator<V> mergeFunction) {
    Optional<E> entryFound = set.stream()
            .filter(e -> Objects.equals(e.getKey(), newEntry.getKey()))
            .findFirst();

    if (entryFound.isPresent()) {
        E existingEntry = entryFound.get();
        set.remove(existingEntry);
        existingEntry.setValue(mergeFunction.apply(existingEntry.getValue(), newEntry.getValue()));
        set.add(existingEntry);
    }
    else {
        set.add(newEntry);
    }
}

This is how you would use it, comparing the entries by value (in reverse) and using the Long::sum merge function for entry collisions. 这是你如何使用它，比较条目的值（反向）和使用Long::sum合并函数进行条目冲突。

Comparator<Map.Entry<String,Long>> comparator = Map.Entry.comparingByValue(Comparator.reverseOrder());
Map<String, Long> topFive = Stream.of(map1, map2)
        .map(Map::entrySet)
        .flatMap(Collection::stream)
        .collect(toMap(Long::sum, comparator, 5));

使用Java 8合并，排序和限制Map流

问题描述

4 个解决方案

解决方案1
3 2018-03-22 04:58:45

解决方案2
0 2018-03-21 14:23:30

解决方案3
0 2018-03-21 15:44:53

解决方案4
0 2018-03-24 03:50:16

使用Java 8合并，排序和限制Map流

问题描述

4 个解决方案

解决方案1 3 2018-03-22 04:58:45

解决方案2 0 2018-03-21 14:23:30

解决方案3 0 2018-03-21 15:44:53

解决方案4 0 2018-03-24 03:50:16

解决方案1
3 2018-03-22 04:58:45

解决方案2
0 2018-03-21 14:23:30

解决方案3
0 2018-03-21 15:44:53

解决方案4
0 2018-03-24 03:50:16