Java ::使用正则表达式解析多行文本

Question

I want to parse a multiline text, so I wrote something like this: 我想解析一个多行文本，所以我写了这样的东西：

String text = "[timestamp1] INFO - Message1 \r\n"
            + "[timestamp2] ERROR - Message2 \r\n"
            + "[timestamp3] INFO - Message3 \r\n"
            + "Message3_details1......... \r\n"
            + "Message3_details2 ......... \r\n";
String regex = "\\[(.*)\\] (.*) - (.*)";
Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println("G1: " + m.group(1));
    System.out.println("G2: " + m.group(2));
    System.out.println("G3: " + m.group(3));
    System.out.println();
}

What I want to get is this: 我想得到的是：

G1: timestamp1
G2: INFO
G3: message1

G1: timestamp2
G2: ERROR
G3: message2

G1: timestamp3
G2: INFO
G3: message3
    message_details1....
    message_details2...

But what I get is like this: 但我得到的是这样的：

G1: timestamp1] INFO - Message1
    [timestamp2] ERROR - Message2
    [timestamp3
G2: INFO
G3: Message3
    Message3_details1........
    Message3_details2........

I'm not able to solve that even with Google's help. 即使有谷歌的帮助，我也无法解决这个问题。

Answer 1

You have used greedy quantifier in your regex. 您在正则表达式中使用了贪心量词。 So, .* in [(.*)] will consume everything till the last found ] . 因此， [(.*)] .* [(.*)]将消耗所有内容直到最后找到] 。 You need to use reluctant quantifier. 你需要使用不情愿的量词。 Add a ? 加一个? after .* . 之后.* 。

Also, for the last .* , you need to use a look-ahead, to make it stop before the next [ . 此外，对于最后一个.* ，您需要使用前瞻，使其在下一个之前停止[ 。

The following code would work: 以下代码可以工作：

String text = "[timestamp1] INFO - Message1 \r\n"
            + "[timestamp2] ERROR - Message2 \r\n"
            + "[timestamp3] INFO - Message3 \r\n"
            + "Message3_details1......... \r\n"
            + "Message3_details2 ......... \r\n";

String regex = "\\[(.*?)\\] (.*?) - (.*?)(?=\\[|$)";

Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println("G1: " + m.group(1));
    System.out.println("G2: " + m.group(2));
    System.out.println("G3: " + m.group(3));
    System.out.println();
}

The last part of the regex - (.*?)(?=\\\\[|$) matches everything till the [ in the next line, or till the end ( $ ). 正则表达式的最后一部分 - (.*?)(?=\\\\[|$)匹配所有内容，直到[在下一行，或直到结束（ $ ）。 $ is required for the last two lines to be captured in group 3 of the last match. 在最后一场比赛的第3组中最后两行需要$ 。

Output: 输出：

G1: timestamp1
G2: INFO
G3: Message1 


G1: timestamp2
G2: ERROR
G3: Message2 


G1: timestamp3
G2: INFO
G3: Message3 
Message3_details1......... 
Message3_details2 .........

Answer 2

尝试"\\\\[(.*?)\\\\] (.*?) - (.*?) \\\\r\\\\n"

Java ::使用正则表达式解析多行文本

问题描述

2 个解决方案

解决方案1
4 已采纳 2013-10-07 10:39:41

解决方案2
0 2013-10-07 10:52:51

Java ::使用正则表达式解析多行文本

问题描述

2 个解决方案

解决方案1 4 已采纳 2013-10-07 10:39:41

解决方案2 0 2013-10-07 10:52:51

解决方案1
4 已采纳 2013-10-07 10:39:41

解决方案2
0 2013-10-07 10:52:51