简体   繁体   English

Java Regex Matcher没有给出预期的结果

[英]Java Regex Matcher not giving expected result

I have the following code. 我有以下代码。

String _partsPattern = "(.*)((\n\n)|(\n)|(.))";
static final Pattern partsPattern = Pattern.compile(_partsPattern);
String text= "PART1: 01/02/03\r\nFindings:no smoking";
Matcher match = partsPattern.matcher(text);
while (match.find()) {
System.out.println( match.group(1));
return; //I just care on the first match for this purpose


Output: PART1: 01/02/0 I was expecting PART1: 01/02/03 why is the 3 at the end of my text not matching in my result. 输出: PART1:01/02/0我期待PART1:01/02/03为什么我文本末尾的3与我的结果不匹配。

Problem with your regex is that . 你的正则表达式的问题是. will not match line separators like \\r or \\n so your regex will stop before \\r and since last part of your regex 不匹配\\r\\n类的行分隔符,所以你的正则表达式会在\\r \\n之前停止,因为你的正则表达式的最后一部分


is required and it can't match \\r last character will be stored in (.) . 是必需的,它不能匹配\\r最后一个字符将存储在(.)

If you don't want to include these line separators in your match just use "(.*)$"; 如果您不想在匹配中包含这些行分隔符,请使用"(.*)$"; pattern with Pattern.MULTILINE flag to make $ match end of each line (it will represent standard line separators like \\r or \\r\\n or \\n but will not include them in match). 使用Pattern.MULTILINE标志的模式使每行的$ match结束(它将表示标准行分隔符,如\\r\\r\\n\\n但不会在匹配中包含它们)。

So try with 所以试试吧

String _partsPattern = "(.*)$"; //parenthesis are not required now
final Pattern partsPattern = Pattern.compile(_partsPattern,Pattern.MULTILINE);

Other approach would be changing your regex to something like (.*)((\\r\\n)|(\\n)|(.)) or (.*)((\\r?\\n)|(.)) but I am not sure what would be the purpose of last (.) (I would probably remove it). 其他方法是将你的正则表达式改为(.*)((\\r\\n)|(\\n)|(.))(.*)((\\r?\\n)|(.))但是我不确定最后(.)的目的是什么(我可能会删除它)。 It is just variation of your original regex. 它只是原始正则表达式的变体。

Works, giving "PART1: 01/02/03 " . 作品,给出"PART1: 01/02/03 " So my guess is that in the real code you read the text maybe with a Reader.readLine and erroneously strip a carriage return + linefeed. 所以我的猜测是,在实际代码中,您可能使用Reader.readLine读取text并错误地删除回车符+换行符。 Far fetched but I cannot imagine otherwise. 远远不过,但我无法想象。 (readLine strips the newline itself.) (readLine剥离换行符本身。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM