简体   繁体   English

正则表达式公式不在HTML标记内

[英]Regex formula not looking inside HTML tags

My regex pattern works for all text not contained within HTML tags: 我的正则表达式模式适用于HTML标记中未包含的所有文本:

((?<!-)\btest(?!-)\b)(?=[^<>]*(?:<\w|$))

In the example below I need it to find both instances of 'test' in these two strings: 在下面的示例中,我需要它在这两个字符串中找到“ test”的两个实例:

vdsv ds test dsv sdlvk 
<b>dsjn vkjsd test sv</b>

In .NET, you may leverage an infinite width lookbehind: 在.NET中,您可以在后面使用无限宽度:

\b(?<!-)test\b(?<!<[^<>]*)(?!-|[^<>]*>)

See the .NET regex demo 请参阅.NET正则表达式演示

In code: 在代码中:

var pattern = @"\b(?<!-)test\b(?<!<[^<>]*)(?!-|[^<>]*>)";

Details 细节

  • \\b - word boundary \\b单词边界
  • (?<!-) - a negative lookbehind that fails the match if there is a - immediately to the left of the current location (?<!-) -如果在当前位置的左侧紧跟着-则负向后搜索将使匹配失败
  • test - word test test -单词test
  • \\b - word boundary \\b单词边界
  • (?<!<[^<>]*) - a negative lookbehind that fails the match if there is a < and any 0 or more chars other than < and > immediately to the left of the current location (?<!<[^<>]*) -如果在当前位置的左边有一个<以及除<>以外的0个或多个字符,匹配失败将失败。
  • (?!-|[^<>]*>) - a negative lookahead that fails the match if there is a - or any 0+ chars other than < and > followed with a > immediately to the right of the current location. (?!-|[^<>]*>) -失败的匹配,如果有一个负先行-或比其他任何0+字符<>遵循的>立即到当前位置的右侧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM