简体   繁体   中英

Regex formula not looking inside HTML tags

My regex pattern works for all text not contained within HTML tags:

((?<!-)\btest(?!-)\b)(?=[^<>]*(?:<\w|$))

In the example below I need it to find both instances of 'test' in these two strings:

vdsv ds test dsv sdlvk 
<b>dsjn vkjsd test sv</b>

In .NET, you may leverage an infinite width lookbehind:

\b(?<!-)test\b(?<!<[^<>]*)(?!-|[^<>]*>)

See the .NET regex demo

In code:

var pattern = @"\b(?<!-)test\b(?<!<[^<>]*)(?!-|[^<>]*>)";

Details

  • \\b - word boundary
  • (?<!-) - a negative lookbehind that fails the match if there is a - immediately to the left of the current location
  • test - word test
  • \\b - word boundary
  • (?<!<[^<>]*) - a negative lookbehind that fails the match if there is a < and any 0 or more chars other than < and > immediately to the left of the current location
  • (?!-|[^<>]*>) - a negative lookahead that fails the match if there is a - or any 0+ chars other than < and > followed with a > immediately to the right of the current location.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM