[英]How can I find the number of lines that contain a certain word in java using Java Stream?
My method would read from a text file and find the word "the" inside of each line and count how many lines contain the word.我的方法将从文本文件中读取并在每行中找到单词“the”并计算包含该单词的行数。 My method does work but the issue is that I need only lines that contain the word by itself, not a substring of the word as well我的方法确实有效,但问题是我只需要包含单词本身的行,而不是单词的 substring
For example, I wouldn't want "therefore" even though it contains "the" it's not by itself.例如,我不想要“因此”,即使它包含“该”,它不是单独的。
I'm trying to find a way to limit the lines to those that contain "the" and have the length of the word be exactly 3 but I'm unable to do that.我正在尝试找到一种方法将行限制为包含“the”并且单词长度正好为 3 的行,但我无法做到这一点。
Here is my method right now:这是我现在的方法:
public static long findThe(String filename) {
long count = 0;
try {
Stream<String> lines = Files.lines(Paths.get(filename));
count = lines.filter(w->w.contains("the"))
.count();
}
catch (IOException x)
{
// TODO Auto-generated catch block
System.out.println("File: " + filename + " not found");
}
System.out.println(count);
return count;
}
For example, if a text file contains these lines:例如,如果文本文件包含以下行:
This is the first line
This is the second line
This is the third line
This is the fourth line
Therefore, this is a name.
The method would return 4该方法将返回 4
Use regex to enforce word boundaries :使用正则表达式来强制单词边界:
count = lines.filter(w -> w.matches("(?i).*\\bthe\\b.*")).count();
or for the general case:或对于一般情况:
count = lines.filter(w -> w.matches("(?i).*\\b" + search + "\\b.*")).count();
Details:细节:
\b
means "word boundary" \b
表示“单词边界”(?i)
means "ignore case" (?i)
表示“忽略大小写” Using word boundaries prevents "Therefore"
matching.使用单词边界可以防止"Therefore"
匹配。
Note that in java, unlike many other languages, String#matches()
must match the entire string (not just find a match within the string) to return true
, hence the .*
at either end of the regex.请注意,在 java 中,与许多其他语言不同, String#matches()
必须匹配整个字符串(而不仅仅是在字符串中找到匹配项)才能返回true
,因此.*
在正则表达式的任一端。
Thanks to Holger for the following valuable recommendations: 感谢 Holger提出以下宝贵建议:
Better:
filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate())
, avoiding to repeat the work ofPattern.compile(…)
for every line.更好:filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate())
,避免对每一行重复Pattern.compile(…)
的工作。
and和
When posting a complete solution, I'd also incorporate try-with-resources , even when the OP did not (or especially as the OP did not).在发布完整的解决方案时,我也会合并try-with-resources ,即使 OP 没有(或者特别是 OP 没有)。
Updated method definition:更新的方法定义:
public static long findThe(String filename) {
long count = 0;
try (Stream<String> lines = Files.lines(Paths.get(filename))) {
count = lines.filter(Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).asPredicate()).count();
} catch (IOException x) {
System.out.println("File: " + filename + " not found");
}
return count;
}
Replace代替
w->w.contains("the")
with和
w->Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE).matcher(w).find()
The \b
is used for word boundary . \b
用于单词边界。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.