如何使用正则表达式C＃避免在自定义HTML标签之间插入HTML标签

Question

I have a requirement, in which the 我有一个要求，其中

'<H3></H3>'

HTML Tag count needs to be found using RegEx C#. 需要使用RegEx C＃查找HTML标记计数。

The following code finds the H3 Tag count correctly, provided, if there is no custom HTML Tags in between(ie, contains text or string alone). 如果之间没有自定义HTML标签（即仅包含文本或字符串），则以下代码可正确找到H3标签计数。

 var regexHeading = new Regex(@"<h3>(.*?)</h3>");

Whereas, if the Heading Tag contains any custom HTML Tag, then the above RegEx is not working as expected. 而如果标题标签包含任何自定义HTML标签，则上述RegEx无法正常工作。 [Ex: <h3><a></a></h3>].

Can anyone suggest, which is the best method to find HTML Tag count using Regular expression c#(even if it contains any custom tags in between). 任何人都可以建议这是使用正则表达式c＃查找HTML标记计数的最佳方法（即使它之间包含任何自定义标记）。

Partial Solution(maybe helpful for someone): I wrote one custom tag, but it is not working in all the scenarios 部分解决方案（可能对某人有所帮助）：我编写了一个自定义标签，但并非在所有情况下都有效

Answer 1

Parsing html using regex is not recommended, there are many answer about it in stackoverflow. 不建议使用正则表达式解析html，在stackoverflow中有很多答案。

Use HtmlAgilityPack instead. 请改用HtmlAgilityPack 。

Example: Try this: HtmlDocument doc = new HtmlDocument(); doc.Load("file.htm"); var h3nodes = doc.DocumentElement.SelectNodes("//body//h3"]; 示例：尝试以下操作： HtmlDocument doc = new HtmlDocument(); doc.Load("file.htm"); var h3nodes = doc.DocumentElement.SelectNodes("//body//h3"]; HtmlDocument doc = new HtmlDocument(); doc.Load("file.htm"); var h3nodes = doc.DocumentElement.SelectNodes("//body//h3"];

or var h3nodes = doc.DocumentElement.Descendants("h3"); 或var h3nodes = doc.DocumentElement.Descendants("h3");

h3nodes is a node list of html elements with tag "h3". h3nodes是带有标签“ h3”的html元素的节点列表。

For "SelectNodes" method parameters, please read about XPath. 对于“ SelectNodes”方法参数，请阅读有关XPath的信息。

Answer 2

If you are just looking to count the number of matches of <H3> elements then you will only need to match the opening element. 如果您只是想计算<H3>元素的匹配数，则只需要匹配开始元素。

If you need to ensure that the element is well formed and has a matching close tag then your current RegEx should work. 如果您需要确保元素的格式正确并具有匹配的close标签，那么您当前的RegEx应该可以使用。 If you can tell us what you are expecting and the results you are getting then it will help us give you a better answer. 如果您能告诉我们您的期望和获得的结果，那么它将帮助我们为您提供更好的答案。

Answer 3

Thanks Ethan Brown :-) The hint you had given resolved my issue. 谢谢伊桑·布朗（Ethan Brown）:-)您给出的提示解决了我的问题。

The Regex is not able to find the 正则表达式无法找到




 

 
 

  
 
  
   
   <H3>/n</H3>

So, I tried replacing the New line tag with empty string as shown below 因此，我尝试用空字符串替换新行标签，如下所示

publishingPageContent = publishingPageContent.Replace("\n", string.Empty);
var regexHeading = new Regex(@"<h3>(.*?)</h3>");
//Find matching                                        
var matchHeadings = regexHeading.Matches(publishingPageContent);

Thanks guyz for helping me in figuring out this issue. 感谢Guyz帮助我解决此问题。 !:) ！:)

如何使用正则表达式C＃避免在自定义HTML标签之间插入HTML标签

问题描述

3 个解决方案

解决方案1
1 2015-05-20 16:19:43

解决方案2
0 2015-05-20 16:09:55

解决方案3
0 2015-06-02 10:00:18

如何使用正则表达式C＃避免在自定义HTML标签之间插入HTML标签

问题描述

3 个解决方案

解决方案1 1 2015-05-20 16:19:43

解决方案2 0 2015-05-20 16:09:55

解决方案3 0 2015-06-02 10:00:18

解决方案1
1 2015-05-20 16:19:43

解决方案2
0 2015-05-20 16:09:55

解决方案3
0 2015-06-02 10:00:18