[英]C# Regex Find Phrase in HTML exclude specific tags
我有 HTML 文本,我有一個特定的短語。 我需要在 HTML 文本中找到我的短語並突出顯示它,但如果它在h或a標簽內,我必須跳過我的短語
例如:這是我的短語:“要突出顯示的短語”
這是我的 HTML 文本
<p>Here starts text and here is phrase to highlight</p>
<a>here, phrase to highlight, supposed to be skipped</a>
<h3>here, phrase to highlight, supposed to be skipped</h3>
<div class="phrase to highlight">Here phrase to highlight must be highlighted again</div>
p 和 div 標簽應該突出我的短語,a 和任何 h 標簽應該跳過我的短語。
我做了負面的回顧以找到我的短語並確保它不是 HTML 屬性
var Pattern = $"(?i)(?<!</?[^>]*|&[^;]*)(\bphrase to highlight\b)";
如何修改我的正則表達式以排除 a 和 h 標簽?
如果沒有嵌套的a
和h
標簽,請使用
(?<!</?[^>]*|&[^;]*|<(?:a|h\d)(?:\s[^>]*)?>[^<]*)(\bphrase to highlight\b)
見證明。
解釋
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
/? '/' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
[^>]* any character except: '>' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
[^;]* any character except: ';' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
a 'a'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
h 'h'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
[^>]* any character except: '>' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
[^<]* any character except: '<' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
phrase to 'phrase to highlight'
highlight
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of \1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.