Situation: I have some text and can only use one regex group to reach the goal. I need to cut the text after more than 5 "=" and remove double blank lines.
This is the regex for matching the text. The programming language is Java. It's matching everything before a new line with 5 or more "="
([^]+?)\n[=]{5,}
Now I need to replace all double empty lines in the matching group. I have no possibility to change the Java code, the only thing I can change is the matching group from the result and the regex itself.
Sample Text:
Hello World
this is text.
Cheers
================
Unimportant text
should result in:
Hello World
this is text.
Cheers
The Java code is the following, but can't be changed:
String regex = "([\\s|\\S]+?)\n[=]{5,}";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
for (int i = 0; i < matcher.groupCount(); i++) {
System.out.println("Group " + i + ":\n" + matcher.group(i));
}
}
only the regex can be changed
I don't believe that regular expressions are capable of intelligently doing this in a single pass (2 passes is cake).
However, I've devised something a bit ugly.. A standard repeat quantifier won't do because you want to modify the subcontents and you don't have access to the underlying java.
(?:([\s\S]*?)(?:(\n\n)\n\n)?)(?:([\s\S]*?)(\n\n)\n\n)?([\s\S]*?)={5,}[\s\S]*
It captures everything before the first four "blank lines" as $1, it captures the first two newlines as $2, for use replacing later.
The next group is the same except that it is followed by a ?
quantifier meaning 0 or 1 times, and thus optional. This group captures the content as $3 and the newlines as $4.
Finally the last group is content at the end, $5.
You can repeat this this group as many times as you like.
Here's a version with four repetitions following the same pattern, groups $1,$3,$5,$7,$9 contain the contents between the excessive newlines, and $2,$4,$6,$8,$10 contain the two newlines, and $11 contains the contents.
(?:([\s\S]*?)(?:(\n\n)\n\n)?)(?:([\s\S]*?)(\n\n)\n\n)?(?:([\s\S]*?)(\n\n)\n\n)?(?:([\s\S]*?)(\n\n)\n\n)?(?:([\s\S]*?)(\n\n)\n\n)?([\s\S]*?)={5,}[\s\S]*
In the case of using the regex immediately above. Your replace would look something like $1$2$3$4$5$6$7$8$9$10$11
.
It's not pretty, for sure, but it's working with what you have.
Finally, an explanation of the first regex (since the second is the same with more repetitions.
(?: # Opens NCG1
( # Opens CG1
[\s\S]*? # Character class (any of the characters within)
# A character class and negated character class, common expression meaning any character.
# * repeats zero or more times
# ? as few times as possible
) # Closes CG1
(?: # Opens NCG2
( # Opens CG2
\n # Token: \n (newline)
\n # Token: \n (newline)
) # Closes CG2
\n # Token: \n (newline)
\n # Token: \n (newline)
)? # Closes NCG2
# ? repeats zero or one times
) # Closes NCG1
# begin repeat section
(?: # Opens NCG3
( # Opens CG3
[\s\S]*? # Character class (any of the characters within)
# A character class and negated character class, common expression meaning any character.
) # Closes CG3
( # Opens CG4
\n # Token: \n (newline)
\n # Token: \n (newline)
) # Closes CG4
\n # Token: \n (newline)
\n # Token: \n (newline)
)? # Closes NCG3
# end repeat section
( # Opens CG5
[\s\S]*? # Character class (any of the characters within)
) # Closes CG5
={5,} # Literal =
# Repeats 5 or more times
[\s\S]* # Character class (any of the characters within)
# * repeats zero or more times
try {
String resultString = YOURSTRING.replaceAll("(?ism)[=]{5,}.*", "");
resultString = resultString.replaceAll("(?ism)^\\s+$", "");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
} catch (IllegalArgumentException ex) {
// Syntax error in the replacement text (unescaped $ signs?)
} catch (IndexOutOfBoundsException ex) {
// Non-existent backreference used the replacement text
}
The first regex replaces [=]{5,}
(5 or more =), and all text after.
The second will clean blank lines.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.