简体   繁体   English

仅当字符串位于两个分隔符之间时,如何将字符串与正则表达式匹配?

[英]How to match a string with a regex only if it's between two delimiters?

My goal is to delete all matches from an input using a regular expression with Java 7:我的目标是使用带有 Java 7 的正则表达式从输入中删除所有匹配项:

input.replaceAll([regex], "");

Given this example input with a target string abc- :给定这个带有目标字符串abc-的示例input

<TAG>test-test-abc-abc-test-abc-test-</TAG>test-abc-test-abc-<TAG>test-abc-test-abc-abc-</TAG>

What regex could I use in the code above to match abc- only when it is between the <TAG> and </TAG> delimiters?我可以在上面的代码中使用什么正则表达式来匹配abc-当它位于<TAG></TAG>分隔符之间时? Here is the desired matching behaviour, with <--> for a match:这是所需的匹配行为,使用<-->进行匹配:

               <--><-->     <-->                                       <-->     <--><-->
<TAG>test-test-abc-abc-test-abc-test-</TAG>test-abc-test-abc-<TAG>test-abc-test-abc-abc-</TAG>

Expected result:预期结果:

<TAG>test-test-test-test-</TAG>test-abc-test-abc-<TAG>test-test-</TAG>

The left and right delimiters are always different.左右分隔符总是不同的。 I am not particularly looking for a recursive solution (nested delimiters).我并不是特别在寻找递归解决方案(嵌套分隔符)。

I think this might be doable with lookaheads and/or lookbehinds but I didn't get anywhere with them.我认为这对于前瞻和/或后视可能是可行的,但我没有得到任何结果。

You can use a regex like您可以使用正则表达式

(?s)(\G(?!^)|<TAG>(?=.*?</TAG>))((?:(?!<TAG>|</TAG>).)*?)abc-

See the regex demo .请参阅正则表达式演示 Replace with $1$2 .替换为$1$2 Details :详情

  • (?s) - a Pattern.DOTALL embedded flag option (?s) - Pattern.DOTALL嵌入标志选项
  • (\G(??^)|<TAG>(.=?*?</TAG>)) - Group 1 ( $1 ): either of the two: (\G(??^)|<TAG>(.=?*?</TAG>)) - 第 1 组 ( $1 ):两者之一:
    • \G(?!^) - end of the previous successful match \G(?!^) - 上一次成功匹配的结束
    • | - or - 或者
    • <TAG>(?=.*?</TAG>) - <TAG> that is immediately followed with any zero or more chars, as few as possible, followed with </TAG> (thus, we make sure there is actually the closing, right-hand delimiter further in the string) <TAG>(?=.*?</TAG>) - <TAG>后面紧跟零个或多个字符,尽可能少,后面跟</TAG> (因此,我们确保确实存在关闭,在字符串中进一步的右手定界符)
  • ((?:(?.<TAG>|</TAG>)?)*?) - Group 2 ( $2 ): any one char ( . ), zero or more repetitions, but as few as possible ( *? ) that does not start a <TAG> or </TAG> char sequences (aka tempered greedy token ) ((?:(?.<TAG>|</TAG>)?)*?) - 第 2 组 ( $2 ):任何一个字符 ( . ),零次或多次重复,但尽可能少 ( *? )不启动<TAG></TAG>字符序列(又名回火贪婪令牌
  • abc- - the pattern to be removed, abc- . abc- - 要删除的模式abc-

In Java:在 Java 中:

String pattern = "(?s)(\\G(?!^)|<TAG>(?=.*?</TAG>))((?:(?!<TAG>|</TAG>).)*?)abc-";
String result = text.replaceAll(pattern, "$1$2");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM