簡體   English   中英

C# 正則表達式在 HTML 中查找短語排除特定標簽

[英]C# Regex Find Phrase in HTML exclude specific tags

我有 HTML 文本,我有一個特定的短語。 我需要在 HTML 文本中找到我的短語並突出顯示它,但如果它在ha標簽內,我必須跳過我的短語

例如:這是我的短語:“要突出顯示的短語”

這是我的 HTML 文本

<p>Here starts text and here is phrase to highlight</p>
<a>here, phrase to highlight, supposed to be skipped</a>
<h3>here, phrase to highlight, supposed to be skipped</h3>
<div class="phrase to highlight">Here phrase to highlight must be highlighted again</div>

p 和 div 標簽應該突出我的短語,a 和任何 h 標簽應該跳過我的短語。

我做了負面的回顧以找到我的短語並確保它不是 HTML 屬性

var Pattern = $"(?i)(?<!</?[^>]*|&[^;]*)(\bphrase to highlight\b)";

如何修改我的正則表達式以排除 a 和 h 標簽?

如果沒有嵌套的ah標簽,請使用

(?<!</?[^>]*|&[^;]*|<(?:a|h\d)(?:\s[^>]*)?>[^<]*)(\bphrase to highlight\b)

證明

解釋

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    <                        '<'
--------------------------------------------------------------------------------
    /?                       '/' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    [^>]*                    any character except: '>' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
    [^;]*                    any character except: ';' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    <                        '<'
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      a                        'a'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      h                        'h'
--------------------------------------------------------------------------------
      \d                       digits (0-9)
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
      [^>]*                    any character except: '>' (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
    [^<]*                    any character except: '<' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
    phrase to                'phrase to highlight'
    highlight
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
  )                        end of \1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM