简体   繁体   English

Java 8 Streams:计算元素的出现次数(列表<String> list1) 来自文本数据列表(List<String> 清单 2)

[英]Java 8 Streams : Count the occurrence of elements(List<String> list1) from list of text data(List<String> list2)

Input :输入 :

List<String> elements= new ArrayList<>();
        elements.add("Oranges");
        elements.add("Figs");
        elements.add("Mangoes");
        elements.add("Apple");

List<String> listofComments = new ArrayList<>();
        listofComments.add("Apples are better than Oranges");
        listofComments.add("I love Mangoes and Oranges");
        listofComments.add("I don't know like Figs. Mangoes are my favorites");
        listofComments.add("I love Mangoes and Apples");

Output : [Mangoes, Apples, Oranges, Figs] -> Output must be in descending order of the number of occurrences of the elements.输出:[芒果、苹果、橙子、无花果] -> 输出必须按元素出现次数的降序排列。 If elements appear equal no.如果元素出现等于没有。 of times then they must be arranged alphabetically.次,那么它们必须按字母顺序排列。

I am new to Java 8 and came across this problem.我是 Java 8 的新手,遇到了这个问题。 I tried solving it partially;我尝试部分解决它; I couldn't sort it.我无法排序。 Can anyone help me with a better code?谁能帮我写出更好的代码?

My piece of code:我的一段代码:

Function<String, Map<String, Long>> function = f -> {
            Long count = listofComments.stream()
                    .filter(e -> e.toLowerCase().contains(f.toLowerCase())).count();
            Map<String, Long> map = new HashMap<>(); //creates map for every element. Is it right?
            map.put(f, count);
            return map;
        };

elements.stream().sorted().map(function).forEach(e-> System.out.print(e));

Output: {Apple=2}{Figs=1}{Mangoes=3}{Oranges=2}输出:{Apple=2}{Figs=1}{Mangoes=3}{Oranges=2}

In real life scenarios you would have to consider that applying an arbitrary number of match operations to an arbitrary number of comments can become quiet expensive when the numbers grow, so it's worth doing some preparation:在现实生活场景中,您必须考虑将任意数量的匹配操作应用于任意数量的评论,当数量增长时可能会变得非常昂贵,因此值得做一些准备:

Map<String,Predicate<String>> filters = elements.stream()
    .sorted(String.CASE_INSENSITIVE_ORDER)
    .map(s -> Pattern.compile(s, Pattern.LITERAL|Pattern.CASE_INSENSITIVE))
    .collect(Collectors.toMap(Pattern::pattern, Pattern::asPredicate,
        (a,b) -> { throw new AssertionError("duplicates"); }, LinkedHashMap::new));

The Predicate class is quiet valuable even when not doing regex matching.即使不进行正则表达式匹配, Predicate类也很有价值。 The combination of the LITERAL and CASE_INSENSITIVE flags enables searches with the intended semantic without the need to convert entire strings to lower case (which, by the way, is not sufficient for all possible scenarios). LITERALCASE_INSENSITIVE标志的组合使搜索具有预期的语义,而无需将整个字符串转换为小写(顺便说一下,这对于所有可能的情况都不够)。 For this kind of matching, the preparation will include building the necessary data structure for the Boyer–Moore Algorithm for more efficient search, internally.对于这种匹配,准备工作将包括为Boyer-Moore 算法构建必要的数据结构,以便在内部进行更有效的搜索。

This map can be reused.该地图可以重复使用。

For your specific task, one way to use it would be对于您的特定任务,使用它的一种方法是

filters.entrySet().stream()
    .map(e -> Map.entry(e.getKey(), listofComments.stream().filter(e.getValue()).count()))
    .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
    .forEachOrdered(e -> System.out.printf("%-7s%3d%n", e.getKey(), e.getValue()));

which will print for your example data:这将为您的示例数据打印:

Mangoes  3
Apple    2
Oranges  2
Figs     1

Note that the filters map is already sorted alphabetically and the sorted of the second stream operation is stable for streams with a defined encounter order, so it only needs to sort by occurrences, the entries with equal elements will keep their relative order, which is the alphabetical order from the source map.请注意, filters映射已经按字母顺序sorted并且第二个流操作的sorted对于具有定义的遇到顺序的流来说是 稳定的,因此它只需要按出现次数排序,具有相等元素的条目将保持它们的相对顺序,即源映射中的字母顺序。

Map.entry(…) requires Java 9 or newer. Map.entry(…)需要 Java 9 或更新版本。 For Java 8, you'd have to use something like对于 Java 8,您必须使用类似
new AbstractMap.SimpleEntry(…) instead. new AbstractMap.SimpleEntry(…)代替。

You can still modify your function to store Map.Entry instead of a complete Map你仍然可以修改你的函数来存储Map.Entry而不是完整的Map

Function<String, Map.Entry<String, Long>> function = f -> Map.entry(f, listOfComments.stream()
        .filter(e -> e.toLowerCase().contains(f.toLowerCase())).count());

and then sort these entries before performing a terminal operation forEach in your case to print然后在执行终端操作之前对这些条目进行排序forEach在您的情况下打印

elements.stream()
        .map(function)
        .sorted(Comparator.comparing(Map.Entry<String, Long>::getValue)
                .reversed().thenComparing(Map.Entry::getKey))
        .forEach(System.out::println);

This will then give you as output the following:这将为您提供以下输出:

Mangoes=3
Apples=2
Oranges=2
Figs=1

First thing is to declare an additional class.首先是声明一个额外的类。 It'll hold element and count:它将保存元素并计数:

class ElementWithCount {
    private final String element;
    private final long count;

    ElementWithCount(String element, long count) {
        this.element = element;
        this.count = count;
    }

    String element() {
        return element;
    }

    long count() {
        return count;
    }
}

To compute count let's declare an additional function:为了计算count让我们声明一个额外的函数:

static long getElementCount(List<String> listOfComments, String element) {
    return listOfComments.stream()
            .filter(comment -> comment.contains(element))
            .count();
}

So now to find the result we need to transform stream of elements to stream of ElementWithCount objects, then sort that stream by count, then transform it back to stream of elements and collect it into result list.所以现在要找到结果,我们需要将元素流转换为ElementWithCount对象流,然后按计数对该流进行排序,然后将其转换回元素流并将其收集到结果列表中。

To make this task easier, let's define comparator as a separate variable:为了使这个任务更容易,让我们将比较器定义为一个单独的变量:

Comparator<ElementWithCount> comparator = Comparator
        .comparing(ElementWithCount::count).reversed()
        .thenComparing(ElementWithCount::element);

and now as all parts are ready, final computation is easy:现在所有部分都准备好了,最终的计算很容易:

List<String> result = elements.stream()
        .map(element -> new ElementWithCount(element, getElementCount(listOfComments, element)))
        .sorted(comparator)
        .map(ElementWithCount::element)
        .collect(Collectors.toList());

You can use Map.Entry instead of a separate class and inline getElementCount , so it'll be "one-line" solution:您可以使用 Map.Entry 而不是单独的类和内联getElementCount ,因此它将是“单行”解决方案:

List<String> result = elements.stream()
        .map(element ->
                new AbstractMap.SimpleImmutableEntry<>(element,
                        listOfComments.stream()
                                .filter(comment -> comment.contains(element))
                                .count()))
        .sorted(Map.Entry.<String, Long>comparingByValue().reversed().thenComparing(Map.Entry.comparingByKey()))
        .map(Map.Entry::getKey)
        .collect(Collectors.toList());

But it's much harder to understand in this form, so I recommend to split it to logical parts.但是这种形式更难理解,因此我建议将其拆分为逻辑部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM