简体   繁体   English

如何使用 Java Stream 在 java 中找到包含某个单词的行数?

[英]How can I find the number of lines that contain a certain word in java using Java Stream?

My method would read from a text file and find the word "the" inside of each line and count how many lines contain the word.我的方法将从文本文件中读取并在每行中找到单词“the”并计算包含该单词的行数。 My method does work but the issue is that I need only lines that contain the word by itself, not a substring of the word as well我的方法确实有效,但问题是我只需要包含单词本身的行,而不是单词的 substring

For example, I wouldn't want "therefore" even though it contains "the" it's not by itself.例如,我不想要“因此”,即使它包含“该”,它不是单独的。

I'm trying to find a way to limit the lines to those that contain "the" and have the length of the word be exactly 3 but I'm unable to do that.我正在尝试找到一种方法将行限制为包含“the”并且单词长度正好为 3 的行,但我无法做到这一点。

Here is my method right now:这是我现在的方法:

public static long findThe(String filename) {
    long count = 0;
    
    try {
        Stream<String> lines = Files.lines(Paths.get(filename));
         count = lines.filter(w->w.contains("the"))
                .count();
        
        } 
    catch (IOException x)
    {
        // TODO Auto-generated catch block
        System.out.println("File: " + filename + " not found");
    }

    
    System.out.println(count);
    return count;
}

For example, if a text file contains these lines:例如,如果文本文件包含以下行:

This is the first line
This is the second line
This is the third line
This is the fourth line
Therefore, this is a name.

The method would return 4该方法将返回 4

Use regex to enforce word boundaries :使用正则表达式来强制单词边界

count = lines.filter(w -> w.matches("(?i).*\\bthe\\b.*")).count();

or for the general case:或对于一般情况:

count = lines.filter(w -> w.matches("(?i).*\\b" + search + "\\b.*")).count();

Details:细节:

  • \b means "word boundary" \b表示“单词边界”
  • (?i) means "ignore case" (?i)表示“忽略大小写”

Using word boundaries prevents "Therefore" matching.使用单词边界可以防止"Therefore"匹配。

Note that in java, unlike many other languages, String#matches() must match the entire string (not just find a match within the string) to return true , hence the .* at either end of the regex.请注意,在 java 中,与许多其他语言不同, String#matches()必须匹配整个字符串(而不仅仅是字符串中找到匹配项)才能返回true ,因此.*在正则表达式的任一端。

Update:更新:

Thanks to Holger for the following valuable recommendations: 感谢 Holger提出以下宝贵建议:

Better: filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()) , avoiding to repeat the work of Pattern.compile(…) for every line.更好: filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()) ,避免对每一行重复Pattern.compile(…)的工作。

and

When posting a complete solution, I'd also incorporate try-with-resources , even when the OP did not (or especially as the OP did not).在发布完整的解决方案时,我也会合并try-with-resources ,即使 OP 没有(或者特别是 OP 没有)。

Updated method definition:更新的方法定义:

public static long findThe(String filename) {
    long count = 0;
    try (Stream<String> lines = Files.lines(Paths.get(filename))) {
        count = lines.filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()).count();
    } catch (IOException x) {
        System.out.println("File: " + filename + " not found");
    }
    return count;
}

Original answer:原答案:

Replace代替

w->w.contains("the")

with

w->Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).matcher(w).find()

The \b is used for word boundary . \b用于单词边界

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Java正则表达式匹配不包含单词的行 - Match lines that do not contain a word using java regex 如何使用 MongoDB Java 查找字段的重复数? - How can I find the number of duplicates for a field using MongoDB Java? 如何使用java查找文件中的字符数、元音数和行数? - How to find the number of characters, vowels and lines in a file using java? 在Java中,如何通过检测某个单词来设置要执行的操作? - In java, how can I set an action to happen by detecting a certain word? 如何以Java 8方式在特定URL上获取文本行? - How to get stream over lines of text at certain URL in Java 8 way? 如何在Java中找到用“,”分隔的换行符和单词的数量? - How to find the number of newlines and word separated by “,” in Java? 如何获取包含某个单词的行数? - How to get the number of rows that contain a certain word? 如何在 java 中使用扫描仪仅接受来自用户输入的特定行数 - How can I accept only a specific number of lines from user input using scanner in java 如何使用 java stream 对由空行分隔的字符串行进行分组 - How do I group lines of strings separated by empty lines using java stream 如何在JAVA中使用正则表达式在字符串行中找到特定字符串? - how can i find specific strings in string lines using regular expression in JAVA?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM