如何在java中使用正则表达式捕获多线模式？

Question

I have a text file that I need to parse using regular expressions. 我有一个文本文件，我需要使用正则表达式解析。 The text that I need to capture is in multiline groups like this: 我需要捕获的文本是多行组，如下所示：

truck
zDoug
Doug's house
(123) 456-7890
Edoug@doug.com
30
61234.56
8/10/2003

vehicle
eRob
Rob's house
(987) 654-3210
Frob@rob.com

For this example I need to capture truck followed by the next seven lines.In other words, in this "block" I have 8 groups. 在这个例子中，我需要捕获卡车，然后是接下来的七行。换句话说，在这个“块”中我有8个组。 This is what I've tried but it will not capture the next line: 这是我尝试过但它不会捕获下一行：

(truck)\n(\w).

NOTE: I'm using the program RegExr to test my regex before I port it to Java. 注意：在将其移植到Java之前，我正在使用程序RegExr来测试我的正则表达式。

Answer 1

(?m)^truck(?:(?:\r\n|[\r\n]).+$)*

This assumes the whole text has been read into a single string (ie, you're not reading a file line-by-line), but it doesn't assume the line separator is always \\n , as your code does. 这假设整个文本已被读入单个字符串（即，您不是逐行读取文件），但它并不认为行分隔符始终是\\n ，正如您的代码所做的那样。 At the minimum you should allow for \\r\\n and \\r as well, which is what (?:\\r\\n|[\\r\\n]) does. 至少你应该允许\\r\\n和\\r ，这是(?:\\r\\n|[\\r\\n])作用。 But it still matches only one separator, so the match stops before the double line separator at the end of the block. 但它仍然只匹配一个分隔符，因此匹配在块结尾处的双线分隔符之前停止。

Once you've matched a block of data, you can split it on the line separators to get the individual lines. 匹配数据块后，可以将其拆分为行分隔符以获取各行。 Here's an example: 这是一个例子：

Pattern p0 = Pattern.compile("(?m)^truck(?:(?:\r\n|[\r\n]).+$)*");
Matcher m = p0.matcher(data);
while (m.find())
{
  String fullMatch = m.group();
  int n = 0;
  for (String s : fullMatch.split("\r\n|[\r\n]"))
  {
    System.out.printf("line %d: %s%n", n++, s);
  }
}

output: 输出：

line 0: truck
line 1: zDoug
line 2: Doug's house
line 3: (123) 456-7890
line 4: Edoug@doug.com
line 5: 30
line 6: 61234.56
line 7: 8/10/2003

I'm also assuming each line of data contains at least one character, and that the blank lines between data block are really empty--ie, no spaces, TABs, or other invisible characters. 我还假设每行数据至少包含一个字符，并且数据块之间的空行实际上是空的 - 即没有空格，TAB或其他不可见字符。

(BTW: To test that regex in RegExr, remove the (?m) and check the multiline box instead. RegExr is powered by ActionScript, so the rules are a little different. For a Java -powered regex tester, check out RegexPlanet .) （顺便说一句：要在RegExr中测试该正则表达式，请删除(?m)并检查multiline框.RegExr由ActionScript提供支持，因此规则略有不同。对于Java驱动的正则表达式测试程序，请查看RegexPlanet 。）

Answer 2

这种模式应该有效((.*|\\n)*)

Answer 3

我认为，为了跨越多行，你的Pattern应该在DOTALL模式下编译，就像

Pattern p = Pattern.compile("truck\\n(.*\\n){7}", Pattern.DOTALL);

如何在java中使用正则表达式捕获多线模式？

问题描述

3 个解决方案

解决方案1
5 已采纳 2011-03-03 07:21:40

解决方案2
3 2015-04-24 15:13:24

解决方案3
3 2011-03-03 03:50:21

如何在java中使用正则表达式捕获多线模式？

问题描述

3 个解决方案

解决方案1 5 已采纳 2011-03-03 07:21:40

解决方案2 3 2015-04-24 15:13:24

解决方案3 3 2011-03-03 03:50:21

解决方案1
5 已采纳 2011-03-03 07:21:40

解决方案2
3 2015-04-24 15:13:24

解决方案3
3 2011-03-03 03:50:21