[英]Regex formula not looking inside HTML tags
My regex pattern works for all text not contained within HTML tags: 我的正则表达式模式适用于HTML标记中未包含的所有文本:
((?<!-)\btest(?!-)\b)(?=[^<>]*(?:<\w|$))
In the example below I need it to find both instances of 'test' in these two strings: 在下面的示例中,我需要它在这两个字符串中找到“ test”的两个实例:
vdsv ds test dsv sdlvk
<b>dsjn vkjsd test sv</b>
In .NET, you may leverage an infinite width lookbehind: 在.NET中,您可以在后面使用无限宽度:
\b(?<!-)test\b(?<!<[^<>]*)(?!-|[^<>]*>)
See the .NET regex demo 请参阅.NET正则表达式演示
In code: 在代码中:
var pattern = @"\b(?<!-)test\b(?<!<[^<>]*)(?!-|[^<>]*>)";
Details 细节
\\b
- word boundary \\b
单词边界 (?<!-)
- a negative lookbehind that fails the match if there is a -
immediately to the left of the current location (?<!-)
-如果在当前位置的左侧紧跟着-
则负向后搜索将使匹配失败 test
- word test
test
-单词test
\\b
- word boundary \\b
单词边界 (?<!<[^<>]*)
- a negative lookbehind that fails the match if there is a <
and any 0 or more chars other than <
and >
immediately to the left of the current location (?<!<[^<>]*)
-如果在当前位置的左边有一个<
以及除<
和>
以外的0个或多个字符,匹配失败将失败。 (?!-|[^<>]*>)
- a negative lookahead that fails the match if there is a -
or any 0+ chars other than <
and >
followed with a >
immediately to the right of the current location. (?!-|[^<>]*>)
-失败的匹配,如果有一个负先行-
或比其他任何0+字符<
和>
遵循的>
立即到当前位置的右侧。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.