简体   繁体   English

Java Regex Matcher没有给出预期的结果

[英]Java Regex Matcher not giving expected result

I have the following code. 我有以下代码。

String _partsPattern = "(.*)((\n\n)|(\n)|(.))";
static final Pattern partsPattern = Pattern.compile(_partsPattern);
String text= "PART1: 01/02/03\r\nFindings:no smoking";
Matcher match = partsPattern.matcher(text);
while (match.find()) {
System.out.println( match.group(1));
return; //I just care on the first match for this purpose

      }

Output: PART1: 01/02/0 I was expecting PART1: 01/02/03 why is the 3 at the end of my text not matching in my result. 输出: PART1:01/02/0我期待PART1:01/02/03为什么我文本末尾的3与我的结果不匹配。

Problem with your regex is that . 你的正则表达式的问题是. will not match line separators like \\r or \\n so your regex will stop before \\r and since last part of your regex 不匹配\\r\\n类的行分隔符,所以你的正则表达式会在\\r \\n之前停止,因为你的正则表达式的最后一部分

(.*)((\n\n)|(\n)|(.))
     ^^^^^^^^^^^^^^^

is required and it can't match \\r last character will be stored in (.) . 是必需的,它不能匹配\\r最后一个字符将存储在(.)

If you don't want to include these line separators in your match just use "(.*)$"; 如果您不想在匹配中包含这些行分隔符,请使用"(.*)$"; pattern with Pattern.MULTILINE flag to make $ match end of each line (it will represent standard line separators like \\r or \\r\\n or \\n but will not include them in match). 使用Pattern.MULTILINE标志的模式使每行的$ match结束(它将表示标准行分隔符,如\\r\\r\\n\\n但不会在匹配中包含它们)。

So try with 所以试试吧

String _partsPattern = "(.*)$"; //parenthesis are not required now
final Pattern partsPattern = Pattern.compile(_partsPattern,Pattern.MULTILINE);

Other approach would be changing your regex to something like (.*)((\\r\\n)|(\\n)|(.)) or (.*)((\\r?\\n)|(.)) but I am not sure what would be the purpose of last (.) (I would probably remove it). 其他方法是将你的正则表达式改为(.*)((\\r\\n)|(\\n)|(.))(.*)((\\r?\\n)|(.))但是我不确定最后(.)的目的是什么(我可能会删除它)。 It is just variation of your original regex. 它只是原始正则表达式的变体。

Works, giving "PART1: 01/02/03 " . 作品,给出"PART1: 01/02/03 " So my guess is that in the real code you read the text maybe with a Reader.readLine and erroneously strip a carriage return + linefeed. 所以我的猜测是,在实际代码中,您可能使用Reader.readLine读取text并错误地删除回车符+换行符。 Far fetched but I cannot imagine otherwise. 远远不过,但我无法想象。 (readLine strips the newline itself.) (readLine剥离换行符本身。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM