[英]Java :: Parsing a multiline text with regular expressions
I want to parse a multiline text, so I wrote something like this: 我想解析一个多行文本,所以我写了这样的东西:
String text = "[timestamp1] INFO - Message1 \r\n"
+ "[timestamp2] ERROR - Message2 \r\n"
+ "[timestamp3] INFO - Message3 \r\n"
+ "Message3_details1......... \r\n"
+ "Message3_details2 ......... \r\n";
String regex = "\\[(.*)\\] (.*) - (.*)";
Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println("G1: " + m.group(1));
System.out.println("G2: " + m.group(2));
System.out.println("G3: " + m.group(3));
System.out.println();
}
What I want to get is this: 我想得到的是:
G1: timestamp1
G2: INFO
G3: message1
G1: timestamp2
G2: ERROR
G3: message2
G1: timestamp3
G2: INFO
G3: message3
message_details1....
message_details2...
But what I get is like this: 但我得到的是这样的:
G1: timestamp1] INFO - Message1
[timestamp2] ERROR - Message2
[timestamp3
G2: INFO
G3: Message3
Message3_details1........
Message3_details2........
I'm not able to solve that even with Google's help. 即使有谷歌的帮助,我也无法解决这个问题。
You have used greedy quantifier in your regex. 您在正则表达式中使用了贪心量词。 So, .*
in [(.*)]
will consume everything till the last found ]
. 因此, [(.*)]
.*
[(.*)]
将消耗所有内容直到最后找到]
。 You need to use reluctant quantifier. 你需要使用不情愿的量词。 Add a ?
加一个?
after .*
. 之后.*
。
Also, for the last .*
, you need to use a look-ahead, to make it stop before the next [
. 此外,对于最后一个.*
,您需要使用前瞻,使其在下一个之前停止[
。
The following code would work: 以下代码可以工作:
String text = "[timestamp1] INFO - Message1 \r\n"
+ "[timestamp2] ERROR - Message2 \r\n"
+ "[timestamp3] INFO - Message3 \r\n"
+ "Message3_details1......... \r\n"
+ "Message3_details2 ......... \r\n";
String regex = "\\[(.*?)\\] (.*?) - (.*?)(?=\\[|$)";
Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println("G1: " + m.group(1));
System.out.println("G2: " + m.group(2));
System.out.println("G3: " + m.group(3));
System.out.println();
}
The last part of the regex - (.*?)(?=\\\\[|$)
matches everything till the [
in the next line, or till the end ( $
). 正则表达式的最后一部分 - (.*?)(?=\\\\[|$)
匹配所有内容,直到[
在下一行,或直到结束( $
)。 $
is required for the last two lines to be captured in group 3 of the last match. 在最后一场比赛的第3组中最后两行需要$
。
Output: 输出:
G1: timestamp1
G2: INFO
G3: Message1
G1: timestamp2
G2: ERROR
G3: Message2
G1: timestamp3
G2: INFO
G3: Message3
Message3_details1.........
Message3_details2 .........
尝试"\\\\[(.*?)\\\\] (.*?) - (.*?) \\\\r\\\\n"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.