正则表达式，用于读取HTML中的标签

Question

<td width="100%"><h1>Chicago, IL Weather</h1></td>

I want to get the text in tag h1. 我想在标签h1中获取文本。 for this I want to use regular expression code in C#. 为此，我想在C＃中使用正则表达式代码。 Can anybody tell me the solution? 有人可以告诉我解决方案吗？

Answer 1

    System.Text.RegularExpressions.Regex bodyRegex = new System.Text.RegularExpressions.Regex(@"(<h1[^>]*>[\u0000-\uFFFF]+?</h1>)");
System.Text.RegularExpressions.Match bodyMatch = bodyRegex.Match(line);
        if (bodyMatch.Success)
          {
           FileContent = bodyMatch.Result("$0");
           FileContent = (FileContent.Replace(@"<h1>", "")).Replace(@"</h1>", "");
}

By the use of this you can find the first h1 tag value 通过此操作，您可以找到第一个h1标签值

Answer 2

Give it a shot 试一试

String h1Regex = "<h1[^>]*?>(?<TagText>.*?)</h1>";

MatchCollection mc = Regex.Matches(Data, h1Regex, RegexOptions.Singleline);

foreach (Match m in mc) {
    Console.Writeline (m.Groups["TagText"].Value);
}

Answer 3

Why do you want to Regex, i know it is the fastest way but it got disadvantages too like : 1. It messes up the code readability, 为什么要使用Regex，我知道这是最快的方法，但是它也有缺点，例如：1.弄乱了代码的可读性，

If your html file changes it would be a great pain for you to write a new regex, 如果您的html文件发生了更改，那么编写新的正则表达式将非常痛苦，

Unless you absolutely have to, leave regex and go for Html parsers(like above mentioned HTMLAgilityPack). 除非您绝对需要，否则请离开regex并使用HTML解析器（如上述HTMLAgilityPack）。

正则表达式，用于读取HTML中的标签

问题描述

3 个解决方案

解决方案1
3 2011-06-03 10:17:51

解决方案2
2 已采纳 2011-06-03 12:55:51

解决方案3
0 2011-06-03 10:20:35

正则表达式，用于读取HTML中的标签

问题描述

3 个解决方案

解决方案1 3 2011-06-03 10:17:51

解决方案2 2 已采纳 2011-06-03 12:55:51

解决方案3 0 2011-06-03 10:20:35

解决方案1
3 2011-06-03 10:17:51

解决方案2
2 已采纳 2011-06-03 12:55:51

解决方案3
0 2011-06-03 10:20:35