[英]How to match a string with a regex only if it's between two delimiters?
My goal is to delete all matches from an input using a regular expression with Java 7:我的目标是使用带有 Java 7 的正则表达式从输入中删除所有匹配项:
input.replaceAll([regex], "");
Given this example input
with a target string abc-
:给定这个带有目标字符串
abc-
的示例input
:
<TAG>test-test-abc-abc-test-abc-test-</TAG>test-abc-test-abc-<TAG>test-abc-test-abc-abc-</TAG>
What regex could I use in the code above to match abc-
only when it is between the <TAG>
and </TAG>
delimiters?我可以在上面的代码中使用什么正则表达式来匹配
abc-
仅当它位于<TAG>
和</TAG>
分隔符之间时? Here is the desired matching behaviour, with <-->
for a match:这是所需的匹配行为,使用
<-->
进行匹配:
<--><--> <--> <--> <--><-->
<TAG>test-test-abc-abc-test-abc-test-</TAG>test-abc-test-abc-<TAG>test-abc-test-abc-abc-</TAG>
Expected result:预期结果:
<TAG>test-test-test-test-</TAG>test-abc-test-abc-<TAG>test-test-</TAG>
The left and right delimiters are always different.左右分隔符总是不同的。 I am not particularly looking for a recursive solution (nested delimiters).
我并不是特别在寻找递归解决方案(嵌套分隔符)。
I think this might be doable with lookaheads and/or lookbehinds but I didn't get anywhere with them.我认为这对于前瞻和/或后视可能是可行的,但我没有得到任何结果。
You can use a regex like您可以使用正则表达式
(?s)(\G(?!^)|<TAG>(?=.*?</TAG>))((?:(?!<TAG>|</TAG>).)*?)abc-
See the regex demo .请参阅正则表达式演示。 Replace with
$1$2
.替换为
$1$2
。 Details :详情:
(?s)
- a Pattern.DOTALL
embedded flag option (?s)
- Pattern.DOTALL
嵌入标志选项(\G(??^)|<TAG>(.=?*?</TAG>))
- Group 1 ( $1
): either of the two: (\G(??^)|<TAG>(.=?*?</TAG>))
- 第 1 组 ( $1
):两者之一:
\G(?!^)
- end of the previous successful match \G(?!^)
- 上一次成功匹配的结束|
- or <TAG>(?=.*?</TAG>)
- <TAG>
that is immediately followed with any zero or more chars, as few as possible, followed with </TAG>
(thus, we make sure there is actually the closing, right-hand delimiter further in the string) <TAG>(?=.*?</TAG>)
- <TAG>
后面紧跟零个或多个字符,尽可能少,后面跟</TAG>
(因此,我们确保确实存在关闭,在字符串中进一步的右手定界符)((?:(?.<TAG>|</TAG>)?)*?)
- Group 2 ( $2
): any one char ( .
), zero or more repetitions, but as few as possible ( *?
) that does not start a <TAG>
or </TAG>
char sequences (aka tempered greedy token ) ((?:(?.<TAG>|</TAG>)?)*?)
- 第 2 组 ( $2
):任何一个字符 ( .
),零次或多次重复,但尽可能少 ( *?
)不启动<TAG>
或</TAG>
字符序列(又名回火贪婪令牌)abc-
- the pattern to be removed, abc-
. abc-
- 要删除的模式abc-
。 In Java:在 Java 中:
String pattern = "(?s)(\\G(?!^)|<TAG>(?=.*?</TAG>))((?:(?!<TAG>|</TAG>).)*?)abc-";
String result = text.replaceAll(pattern, "$1$2");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.