[英]Java Regex Matcher not giving expected result
I have the following code. 我有以下代码。
String _partsPattern = "(.*)((\n\n)|(\n)|(.))";
static final Pattern partsPattern = Pattern.compile(_partsPattern);
String text= "PART1: 01/02/03\r\nFindings:no smoking";
Matcher match = partsPattern.matcher(text);
while (match.find()) {
System.out.println( match.group(1));
return; //I just care on the first match for this purpose
}
Output: PART1: 01/02/0 I was expecting PART1: 01/02/03 why is the 3 at the end of my text not matching in my result. 输出: PART1:01/02/0我期待PART1:01/02/03为什么我文本末尾的3与我的结果不匹配。
Problem with your regex is that .
你的正则表达式的问题是
.
will not match line separators like \\r
or \\n
so your regex will stop before \\r
and since last part of your regex 不匹配
\\r
或\\n
类的行分隔符,所以你的正则表达式会在\\r
\\n
之前停止,因为你的正则表达式的最后一部分
(.*)((\n\n)|(\n)|(.))
^^^^^^^^^^^^^^^
is required and it can't match \\r
last character will be stored in (.)
. 是必需的,它不能匹配
\\r
最后一个字符将存储在(.)
。
If you don't want to include these line separators in your match just use "(.*)$";
如果您不想在匹配中包含这些行分隔符,请使用
"(.*)$";
pattern with Pattern.MULTILINE
flag to make $
match end of each line (it will represent standard line separators like \\r
or \\r\\n
or \\n
but will not include them in match). 使用
Pattern.MULTILINE
标志的模式使每行的$
match结束(它将表示标准行分隔符,如\\r
或\\r\\n
或\\n
但不会在匹配中包含它们)。
So try with 所以试试吧
String _partsPattern = "(.*)$"; //parenthesis are not required now
final Pattern partsPattern = Pattern.compile(_partsPattern,Pattern.MULTILINE);
Other approach would be changing your regex to something like (.*)((\\r\\n)|(\\n)|(.))
or (.*)((\\r?\\n)|(.))
but I am not sure what would be the purpose of last (.)
(I would probably remove it). 其他方法是将你的正则表达式改为
(.*)((\\r\\n)|(\\n)|(.))
或(.*)((\\r?\\n)|(.))
但是我不确定最后(.)
的目的是什么(我可能会删除它)。 It is just variation of your original regex. 它只是原始正则表达式的变体。
Works, giving "PART1: 01/02/03 "
. 作品,给出
"PART1: 01/02/03 "
。 So my guess is that in the real code you read the text
maybe with a Reader.readLine
and erroneously strip a carriage return + linefeed. 所以我的猜测是,在实际代码中,您可能使用
Reader.readLine
读取text
并错误地删除回车符+换行符。 Far fetched but I cannot imagine otherwise. 远远不过,但我无法想象。 (readLine strips the newline itself.)
(readLine剥离换行符本身。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.