简体   繁体   English

C# 正则表达式在一个模式中找到多个 HTML 标签

[英]C# Regex find multiple HTML tags in one pattern

I have a random message (I don't know what will be the content) however, I know that is may contain HTML tags like <b> and <a href=> ... then I know that there is no more HTML tag than these.我有一条随机消息(我不知道内容是什么)但是,我知道它可能包含 HTML 标签,如<b><a href=> ...然后我知道不再有 HTML 标签比这些。 So, I am looking for a pattern which will be able to recognize and get the content between bold markup, also hyperlink and its content.因此,我正在寻找一种能够识别和获取粗体标记、超链接及其内容之间的内容的模式。 I already did this code:我已经做了这个代码:

string pattern = "(<b>(.*)</b>)|(<a href=.*?>(.*?)<\\/a>)";
Match match = Regex.Match(content, pattern);
while (match.Success)

      if (match.Groups[0].Value.Contains("<b>"))
      {
            messageBlock.Dispatcher.Invoke(delegate
            {
                   messageBlock.Inlines.Add(new Run(content.Substring(0, match.Index)));
                   messageBlock.Inlines.Add(new Bold(new Run(match.Groups[1].Value)));
             });
       }
       else if (match.Groups[0].Value.Contains("<a href="))
       }
}

Nevertheless with this pattern, I can't recover the content match by example <a href=?> ... It only works for the bold tag.. Thank you尽管如此,使用这种模式,我无法通过示例恢复内容匹配<a href=?> ...它仅适用于粗体标签.. 谢谢

For parsing html is better to use Html Agility pack对于解析 html 最好使用Html 敏捷包

Try @"(?s)<(?:(a)(?=\s)(?=(?:[^>""']|""[^""]*""|'[^']*')*?\shref\s*=(?:(['""])(.*?)\2))\s+(?:"".*?""|'.*?'|[^>]*?)+|b\s*)>(.*?)</(?(1)a|b)\s*>"试试@"(?s)<(?:(a)(?=\s)(?=(?:[^>""']|""[^""]*""|'[^']*')*?\shref\s*=(?:(['""])(.*?)\2))\s+(?:"".*?""|'.*?'|[^>]*?)+|b\s*)>(.*?)</(?(1)a|b)\s*>"

Where在哪里

  • If Grp1 matched the a was found, otherwise the b was found如果 Grp1 匹配,则找到a ,否则找到b
  • Grp 2 disregard Grp 2 无视
  • Grp 3 contains href value if grp1 matched如果 grp1 匹配,则 Grp 3 包含 href 值
  • Grp 4 contains contents Grp 4 包含内容

PCRE demo PCRE 演示
but works for C#但适用于 C#

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM