简体   繁体   English

Java ::使用正则表达式解析多行文本

[英]Java :: Parsing a multiline text with regular expressions

I want to parse a multiline text, so I wrote something like this: 我想解析一个多行文本,所以我写了这样的东西:

String text = "[timestamp1] INFO - Message1 \r\n"
            + "[timestamp2] ERROR - Message2 \r\n"
            + "[timestamp3] INFO - Message3 \r\n"
            + "Message3_details1......... \r\n"
            + "Message3_details2 ......... \r\n";
String regex = "\\[(.*)\\] (.*) - (.*)";
Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println("G1: " + m.group(1));
    System.out.println("G2: " + m.group(2));
    System.out.println("G3: " + m.group(3));
    System.out.println();
}

What I want to get is this: 我想得到的是:

G1: timestamp1
G2: INFO
G3: message1

G1: timestamp2
G2: ERROR
G3: message2

G1: timestamp3
G2: INFO
G3: message3
    message_details1....
    message_details2...

But what I get is like this: 但我得到的是这样的:

G1: timestamp1] INFO - Message1
    [timestamp2] ERROR - Message2
    [timestamp3
G2: INFO
G3: Message3
    Message3_details1........
    Message3_details2........

I'm not able to solve that even with Google's help. 即使有谷歌的帮助,我也无法解决这个问题。

You have used greedy quantifier in your regex. 您在正则表达式中使用了贪心量词。 So, .* in [(.*)] will consume everything till the last found ] . 因此, [(.*)] .* [(.*)]将消耗所有内容直到最后找到] You need to use reluctant quantifier. 你需要使用不情愿的量词。 Add a ? 加一个? after .* . 之后.*

Also, for the last .* , you need to use a look-ahead, to make it stop before the next [ . 此外,对于最后一个.* ,您需要使用前瞻,使其在下一个之前停止[

The following code would work: 以下代码可以工作:

String text = "[timestamp1] INFO - Message1 \r\n"
            + "[timestamp2] ERROR - Message2 \r\n"
            + "[timestamp3] INFO - Message3 \r\n"
            + "Message3_details1......... \r\n"
            + "Message3_details2 ......... \r\n";

String regex = "\\[(.*?)\\] (.*?) - (.*?)(?=\\[|$)";

Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println("G1: " + m.group(1));
    System.out.println("G2: " + m.group(2));
    System.out.println("G3: " + m.group(3));
    System.out.println();
}

The last part of the regex - (.*?)(?=\\\\[|$) matches everything till the [ in the next line, or till the end ( $ ). 正则表达式的最后一部分 - (.*?)(?=\\\\[|$)匹配所有内容,直到[在下一行,或直到结束( $ )。 $ is required for the last two lines to be captured in group 3 of the last match. 在最后一场比赛的第3组中最后两行需要$

Output: 输出:

G1: timestamp1
G2: INFO
G3: Message1 


G1: timestamp2
G2: ERROR
G3: Message2 


G1: timestamp3
G2: INFO
G3: Message3 
Message3_details1......... 
Message3_details2 ......... 

尝试"\\\\[(.*?)\\\\] (.*?) - (.*?) \\\\r\\\\n"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM