My regex pattern works for all text not contained within HTML tags:
((?<!-)\btest(?!-)\b)(?=[^<>]*(?:<\w|$))
In the example below I need it to find both instances of 'test' in these two strings:
vdsv ds test dsv sdlvk
<b>dsjn vkjsd test sv</b>
In .NET, you may leverage an infinite width lookbehind:
\b(?<!-)test\b(?<!<[^<>]*)(?!-|[^<>]*>)
See the .NET regex demo
In code:
var pattern = @"\b(?<!-)test\b(?<!<[^<>]*)(?!-|[^<>]*>)";
Details
\\b
- word boundary (?<!-)
- a negative lookbehind that fails the match if there is a -
immediately to the left of the current location test
- word test
\\b
- word boundary (?<!<[^<>]*)
- a negative lookbehind that fails the match if there is a <
and any 0 or more chars other than <
and >
immediately to the left of the current location (?!-|[^<>]*>)
- a negative lookahead that fails the match if there is a -
or any 0+ chars other than <
and >
followed with a >
immediately to the right of the current location.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.