I want to match text between two Strings, but the starting String has strict boundary conditions.
Sample input:
start
From: h
From:b
xyz
Subject:
end
I need to match between From:
and Subject:
.
If I use (From:.*).*(Subject:)
with dotall, it produces
From: h
From:b
xyz
Subject:
but I need only
From:b
xyz
Subject:
because the starting string has strict boundary conditions. This is necessary because the starting String could be anywhere in the document, and then the above regex will match a big text rather than just few lines.
%%%%%%%%%%%% Problem redefined %%%%%%%%%%%%%% I have text in which I need to match:
From:<any text>
To:<any text>
Subject:<any text>
The catch is that: All the three components can be in one line, could be separated by one newline, or could be separated by 2 newlines... There are text before and after the desired match which could contain From:<any text>
, that's why I need strict boundaries.
Try this out:
String input = "start From: h From:b xyz Subject: end";
Matcher matcher = Pattern.compile("(?<=^((?!From:).)*(From: [A-Za-z0-9] ))(.+?)(Subject:)").matcher(input);
if (matcher.find())
{
System.out.println(matcher.group());
}
Output: From:b xyz Subject:
.
Explanation of regex ( (?<=^((?!From:).)*(From: [A-Za-z0-9] ))(.+?)(Subject:)
):
(?<=
start looking behind ^
the start of the string ((?!From:).)
if looking ahead and you can't see "From:" then match any character *
matches the previous statement zero or more times (From: [A-Za-z0-9] ))
matches the first "From:" and it's contents )
stop looking behind (.+?)
matches the string we are looking for (Subject:)
matches the subject field Instead of using .*
in DOTALL mode, I suggest you match one line at a time, after asserting that the line doesn't start with From:
.
"(?m)^From:.*[\r\n]+(?:(?!From:).*[\r\n]+)*Subject:.*$"
That's the minimum implementation. Depending on how your text is structured, it could still match too much or too slowly (especially in cases where no match is possible). Here's a more robust version:
"(?m)^(?>From:.*[\r\n]+)(?>(?!From:|Subject:).*[\r\n]+)*+Subject:.*$"
Use the multiline modifier and negative lookahead:
(?s)From:((?!From:).)*?Subject:
@ regex101
NOTE: the regex101 fiddle contains the live regex and test data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.