简体   繁体   English

如果正则表达式不在标记内,如何匹配正则表达式

[英]How to match a regex only if it's not inside of a tag

I'm trying to match '<TAG2>' only if it's not inside of <TAG> . 我只是试图匹配'<TAG2>'如果它不在<TAG>

For example: 例如:

This is a WORD --- Match
<TAG><TAG2>xxx</TAG2></TAG> --- Not a match
<TAG>xxxxxxx<TAG2>yyyy</TAG2>xxxxxxx</TAG>  --- Not a match

I'm using PHP so I can't do a variable length negative look-behind. 我正在使用PHP,所以我不能做一个可变长度负面的后视。

I tried using the regex in Match text not inside span tags , but this doesn't work in my case if there's multiple tags. 我尝试在匹配文本中使用正则表达式不在span标签内 ,但如果有多个标签,这在我的情况下不起作用。

<TAG><TAG2>xxx</TAG2></TAG>
<TAG><TAG2>xxx</TAG2></TAG>  - This will match from the first <TAG2> to  the end of the second </TAG2>.  I'm assuming this is because my regex includes <TAG2>[\s\S]*</TAG2>

Foreward Foreward

I recommend using a parsing engine for this, however it sounds like you have creative control over the complexity of your HTML. 我建议使用解析引擎,但听起来你可以创造性地控制HTML的复杂性。 So as long as you do not have complex nesting situations or other odd edge cases, then this should work. 因此,只要您没有复杂的嵌套情况或其他奇怪的边缘情况,那么这应该可行。

Description 描述

(<tag2>.*?</tag2>)|<tag>(?:(?!<tag\s?>).)*

正则表达式可视化

This regular expression will do the following: 这个正则表达式将执行以下操作:

  • populate capture group 1 with <tag2>...</tag2 providing this tag is not already enclosed inside <tag>...</tag> like <tag>.<tag2>..</tag2>.</tag> 使用<tag2>...</tag2填充捕获组1 <tag2>...</tag2提供此标记尚未包含在<tag>...</tag><tag>.<tag2>..</tag2>.</tag>
  • This will also match all <tag>...<tag> , but where this match occurs the capture group 1 will have no value. 这也将匹配所有<tag>...<tag> ,但是在匹配发生的地方,捕获组1将没有值。

Example

Live Demo 现场演示

https://regex101.com/r/uQ7xR5/1 https://regex101.com/r/uQ7xR5/1

Sample text 示范文本

This <tag2>is a WORD</tag2> --- Match
<TAG><TAG2>xxx</TAG2></TAG> --- Not a match
<TAG>xxxxxxx<TAG2>yyyy</TAG2>xxxxxxx</TAG>  --- Not a match

Sample Matches 样本匹配

Note how capture group 1 is only popoulated by the <tag2>...</tag2 where it was not encapsulated inside <tag>..</tag> 请注意捕获组1如何仅由<tag2>...</tag2 tag2 <tag2>...</tag2 tag2 <tag2>...</tag2 ,它未被封装在<tag>..</tag>

[0][0] = <tag2>is a WORD</tag2>
[0][1] = <tag2>is a WORD</tag2>

[1][0] = <TAG><TAG2>xxx</TAG2></TAG> --- Not a match
[1][1] = 

[2][0] = <TAG>xxxxxxx<TAG2>yyyy</TAG2>xxxxxxx</TAG>  --- Not a match
[2][1] = 

Explanation 说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    <tag2>                   '<tag2>'
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
    </tag2>                  '</tag2>'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  <tag>                    '<tag>'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
      <tag                     '<tag'
----------------------------------------------------------------------
      \s?                      whitespace (\n, \r, \t, \f, and " ")
                               (optional (matching the most amount
                               possible))
----------------------------------------------------------------------
      >                        '>'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM