简体   繁体   中英

Match for multiple lines of text between delimiters in Java

How can I match multiple lines of text between delimiters in Java?

Question best explained by an example:

...
unimportant text
EndOfEntry
Key=Value
unimportant text
maybe a few lines of unimportant text
AnotherKey=AnotherValue
EndOfEntry
more unimportant text
...

In the above, I want to match on Key=Value.*AnotherKey=AnotherValue having appeared in one entry together. I simply want to know whether or not the pattern appear -- I don't need to replace anything.

However, with the same desired match, if given multiple entries, such as:

...
unimportant text
EndOfEntry
Key=Value
unimportant text
maybe a few lines of unimportant text
AnotherKey=NotMyValue
EndOfEntry
RandomKey=Value
unimportant text
maybe a few lines of unimportant text
AnotherKey=AnotherValue
EndOfEntry
more unimportant text
...

I wouldn't want the above to be matched successfully, because we don't see exactly Key=Value and AnotherKey=AnotherValue inside a single "entry". Instead, we see Key=Value in the first entry and AnotherKey=AnotherValue in the second entry.

I've been trying with a regex like (and of course \\S\\s can be replaced by the DOTALL option for Pattern):

Key=Value[\S\s]*?AnotherKey=AnotherValue

but of course that matches both. I've also tried:

Key=Value[^EndOfEntry]*?AnotherKey=AnotherValue

but that doesn't work because then there's no dot and we're not matching the newlines at all.

Is there one single regex that can match precisely what I'm looking for? Would it simplify things to strip newlines first or some other two-step processing (which I'm trying to avoid simply for the sake of education)?

You should simply use:

\bKey=Value\b(?:(?!EndOfEntry).)*?\bAnotherKey=AnotherValue\b

(with the DOTALL flag, as you suggested in your question).

Experiment it live here on regex101 .


How it works:

I've basically simply replaced your .* by that expression: ((?!EndOfEntry).)* , which represents roughly anything that doesn't contain EndOfEntry .

In addition, to avoid matching with the pairs RandomKey=Value and AnotherKey=AnotherValue , since RandomKey=Value would also match Key=Value (for example), I've added an other little tweak:

I've surrounded your pairs with \\b (asserts that we're at a word boundary) (or \\s , for any space character), so we'd only have a match when the entire word is matching.


Here's a piece of Java code that uses the regex I'm suggesting against your examples:

final Pattern pattern = Pattern.compile("\\bKey=Value\\b(?:(?!EndOfEntry).)*?\\bAnotherKey=AnotherValue\\b", Pattern.DOTALL);

final String invalid = "unimportant text\n" +
                "EndOfEntry\n" +
                "Key=Value\n" +
                "unimportant text\n" +
                "maybe a few lines of unimportant text\n" +
                "AnotherKey=NotMyValue\n" +
                "EndOfEntry\n" +
                "RandomKey=Value\n" +
                "unimportant text\n" +
                "maybe a few lines of unimportant text\n" +
                "AnotherKey=AnotherValue\n" +
                "EndOfEntry\n" +
                "more unimportant text";

final String valid = "unimportant text\n" +
                "EndOfEntry\n" +
                "Key=Value\n" +
                "unimportant text\n" +
                "maybe a few lines of unimportant text\n" +
                "AnotherKey=AnotherValue\n" +
                "EndOfEntry\n" +
                "more unimportant text";

System.out.println(pattern.matcher(invalid).find());
System.out.println(pattern.matcher(valid).find());

Output:

false
true

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM