简体   繁体   中英

RegEx - problem with multiline input

I have a String with multiline content and want to select a multiline region, preferably using a regular expression (just because I'm trying to understand Java RegEx at the moment).

Consider the input like:

Line 1
abc START def
Line 2
Line 3
gh END jklm
Line 4

Assuming START and END are unique and the start/end markers for the region, I'd like to create a pattern/matcher to get the result:

 def
Line 2
Line 3
gh 

My current attempt is

Pattern p = Pattern.compile("START(.*)END");
Matcher m = p.matcher(input);
if (m.find())
  System.out.println(m.group(1));

But the result is

gh

So m.start() seems to point at the beginning of the line that contains the 'end marker'. I tried to add Pattern.MULTILINE to the compile call but that (alone) didn't change anything.

Where is my mistake?

You want Pattern.DOTALL , so . matches newline characters. MULTILINE addresses a different issue, the ^ and $ anchors.

Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);

You want to set Pattern.DOTALL (so you can match end of line characters with your . wildcard), see this test:

@Test
public void testMultilineRegex() throws Exception {
    final String input = "Line 1\nabc START def\nLine 2\nLine 3\ngh END jklm\nLine 4";
    final String expected = " def\nLine 2\nLine 3\ngh ";
    final Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
    final Matcher m = p.matcher(input);
    if (m.find()) {
        Assert.assertEquals(expected, m.group(1));
    } else {
        Assert.fail("pattern not found");
    }
}

The regex metachar . does not match a newline. You can try the regex:

START([\w\W]*)END

which uses [\\w\\W] in place of . .

[\\w\\W] is a char class to match a word-char and a non-word-char, so effectively matches everything.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM