如何使用 Java Stream 在 java 中找到包含某个单词的行数？

Question

我的方法将从文本文件中读取并在每行中找到单词“the”并计算包含该单词的行数。 我的方法确实有效，但问题是我只需要包含单词本身的行，而不是单词的 substring

例如，我不想要“因此”，即使它包含“该”，它不是单独的。

我正在尝试找到一种方法将行限制为包含“the”并且单词长度正好为 3 的行，但我无法做到这一点。

这是我现在的方法：

public static long findThe(String filename) {
    long count = 0;
    
    try {
        Stream<String> lines = Files.lines(Paths.get(filename));
         count = lines.filter(w->w.contains("the"))
                .count();
        
        } 
    catch (IOException x)
    {
        // TODO Auto-generated catch block
        System.out.println("File: " + filename + " not found");
    }

    
    System.out.println(count);
    return count;
}

例如，如果文本文件包含以下行：

This is the first line
This is the second line
This is the third line
This is the fourth line
Therefore, this is a name.

该方法将返回 4

Answer 1

使用正则表达式来强制单词边界：

count = lines.filter(w -> w.matches("(?i).*\\bthe\\b.*")).count();

或对于一般情况：

count = lines.filter(w -> w.matches("(?i).*\\b" + search + "\\b.*")).count();

细节：

\b表示“单词边界”
(?i)表示“忽略大小写”

使用单词边界可以防止"Therefore"匹配。

请注意，在 java 中，与许多其他语言不同， String#matches()必须匹配整个字符串（而不仅仅是在字符串中找到匹配项）才能返回true ，因此.*在正则表达式的任一端。

Answer 2

更新：

感谢 Holger提出以下宝贵建议：

更好： filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()) ，避免对每一行重复Pattern.compile(…)的工作。

和

在发布完整的解决方案时，我也会合并try-with-resources ，即使 OP 没有（或者特别是 OP 没有）。

更新的方法定义：

public static long findThe(String filename) {
    long count = 0;
    try (Stream<String> lines = Files.lines(Paths.get(filename))) {
        count = lines.filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()).count();
    } catch (IOException x) {
        System.out.println("File: " + filename + " not found");
    }
    return count;
}

原答案：

代替

w->w.contains("the")

和

w->Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).matcher(w).find()

\b用于单词边界。

如何使用 Java Stream 在 java 中找到包含某个单词的行数？

问题描述

2 个解决方案

解决方案1
6 已采纳 2020-12-13 23:36:33

解决方案2
3 2020-12-13 23:29:56

更新：

原答案：

如何使用 Java Stream 在 java 中找到包含某个单词的行数？

问题描述

2 个解决方案

解决方案1 6 已采纳 2020-12-13 23:36:33

解决方案2 3 2020-12-13 23:29:56

更新：

原答案：

解决方案1
6 已采纳 2020-12-13 23:36:33

解决方案2
3 2020-12-13 23:29:56