如何使用 Java Stream 在 java 中找到包含某个单词的行数？

Question

My method would read from a text file and find the word "the" inside of each line and count how many lines contain the word.我的方法将从文本文件中读取并在每行中找到单词“the”并计算包含该单词的行数。 My method does work but the issue is that I need only lines that contain the word by itself, not a substring of the word as well我的方法确实有效，但问题是我只需要包含单词本身的行，而不是单词的 substring

For example, I wouldn't want "therefore" even though it contains "the" it's not by itself.例如，我不想要“因此”，即使它包含“该”，它不是单独的。

I'm trying to find a way to limit the lines to those that contain "the" and have the length of the word be exactly 3 but I'm unable to do that.我正在尝试找到一种方法将行限制为包含“the”并且单词长度正好为 3 的行，但我无法做到这一点。

Here is my method right now:这是我现在的方法：

public static long findThe(String filename) {
    long count = 0;
    
    try {
        Stream<String> lines = Files.lines(Paths.get(filename));
         count = lines.filter(w->w.contains("the"))
                .count();
        
        } 
    catch (IOException x)
    {
        // TODO Auto-generated catch block
        System.out.println("File: " + filename + " not found");
    }

    
    System.out.println(count);
    return count;
}

For example, if a text file contains these lines:例如，如果文本文件包含以下行：

This is the first line
This is the second line
This is the third line
This is the fourth line
Therefore, this is a name.

The method would return 4该方法将返回 4

Answer 1

Use regex to enforce word boundaries :使用正则表达式来强制单词边界：

count = lines.filter(w -> w.matches("(?i).*\\bthe\\b.*")).count();

or for the general case:或对于一般情况：

count = lines.filter(w -> w.matches("(?i).*\\b" + search + "\\b.*")).count();

Details:细节：

\b means "word boundary" \b表示“单词边界”
(?i) means "ignore case" (?i)表示“忽略大小写”

Using word boundaries prevents "Therefore" matching.使用单词边界可以防止"Therefore"匹配。

Note that in java, unlike many other languages, String#matches() must match the entire string (not just find a match within the string) to return true , hence the .* at either end of the regex.请注意，在 java 中，与许多其他语言不同， String#matches()必须匹配整个字符串（而不仅仅是在字符串中找到匹配项）才能返回true ，因此.*在正则表达式的任一端。

Answer 2

Update:更新：

Thanks to Holger for the following valuable recommendations: 感谢 Holger提出以下宝贵建议：

Better: filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()) , avoiding to repeat the work of Pattern.compile(…) for every line.更好： filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()) ，避免对每一行重复Pattern.compile(…)的工作。

and和

When posting a complete solution, I'd also incorporate try-with-resources , even when the OP did not (or especially as the OP did not).在发布完整的解决方案时，我也会合并try-with-resources ，即使 OP 没有（或者特别是 OP 没有）。

Updated method definition:更新的方法定义：

public static long findThe(String filename) {
    long count = 0;
    try (Stream<String> lines = Files.lines(Paths.get(filename))) {
        count = lines.filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()).count();
    } catch (IOException x) {
        System.out.println("File: " + filename + " not found");
    }
    return count;
}

Original answer:原答案：

Replace代替

w->w.contains("the")

with和

w->Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).matcher(w).find()

The \b is used for word boundary . \b用于单词边界。

如何使用 Java Stream 在 java 中找到包含某个单词的行数？

问题描述

2 个解决方案

解决方案1
6 已采纳 2020-12-13 23:36:33

解决方案2
3 2020-12-13 23:29:56

Update:更新：

Original answer:原答案：

如何使用 Java Stream 在 java 中找到包含某个单词的行数？

问题描述

2 个解决方案

解决方案1 6 已采纳 2020-12-13 23:36:33

解决方案2 3 2020-12-13 23:29:56

Update:更新：

Original answer:原答案：

解决方案1
6 已采纳 2020-12-13 23:36:33

解决方案2
3 2020-12-13 23:29:56