如何使用 Java Stream 在 java 中找到包含某個單詞的行數？

Question

我的方法將從文本文件中讀取並在每行中找到單詞“the”並計算包含該單詞的行數。 我的方法確實有效，但問題是我只需要包含單詞本身的行，而不是單詞的 substring

例如，我不想要“因此”，即使它包含“該”，它不是單獨的。

我正在嘗試找到一種方法將行限制為包含“the”並且單詞長度正好為 3 的行，但我無法做到這一點。

這是我現在的方法：

public static long findThe(String filename) {
    long count = 0;
    
    try {
        Stream<String> lines = Files.lines(Paths.get(filename));
         count = lines.filter(w->w.contains("the"))
                .count();
        
        } 
    catch (IOException x)
    {
        // TODO Auto-generated catch block
        System.out.println("File: " + filename + " not found");
    }

    
    System.out.println(count);
    return count;
}

例如，如果文本文件包含以下行：

This is the first line
This is the second line
This is the third line
This is the fourth line
Therefore, this is a name.

該方法將返回 4

Answer 1

使用正則表達式來強制單詞邊界：

count = lines.filter(w -> w.matches("(?i).*\\bthe\\b.*")).count();

或對於一般情況：

count = lines.filter(w -> w.matches("(?i).*\\b" + search + "\\b.*")).count();

細節：

\b表示“單詞邊界”
(?i)表示“忽略大小寫”

使用單詞邊界可以防止"Therefore"匹配。

請注意，在 java 中，與許多其他語言不同， String#matches()必須匹配整個字符串（而不僅僅是在字符串中找到匹配項）才能返回true ，因此.*在正則表達式的任一端。

Answer 2

更新：

感謝 Holger提出以下寶貴建議：

更好： filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()) ，避免對每一行重復Pattern.compile(…)的工作。

和

在發布完整的解決方案時，我也會合並try-with-resources ，即使 OP 沒有（或者特別是 OP 沒有）。

更新的方法定義：

public static long findThe(String filename) {
    long count = 0;
    try (Stream<String> lines = Files.lines(Paths.get(filename))) {
        count = lines.filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()).count();
    } catch (IOException x) {
        System.out.println("File: " + filename + " not found");
    }
    return count;
}

原答案：

代替

w->w.contains("the")

和

w->Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).matcher(w).find()

\b用於單詞邊界。

如何使用 Java Stream 在 java 中找到包含某個單詞的行數？

問題描述

2 個解決方案

解決方案1
6 已采納 2020-12-13 23:36:33

解決方案2
3 2020-12-13 23:29:56

更新：

原答案：

如何使用 Java Stream 在 java 中找到包含某個單詞的行數？

問題描述

2 個解決方案

解決方案1 6 已采納 2020-12-13 23:36:33

解決方案2 3 2020-12-13 23:29:56

更新：

原答案：

解決方案1
6 已采納 2020-12-13 23:36:33

解決方案2
3 2020-12-13 23:29:56