简体   繁体   中英

Regular expression to read tags in a HTML

<td width="100%"><h1>Chicago, IL Weather</h1></td>

I want to get the text in tag h1. for this I want to use regular expression code in C#. Can anybody tell me the solution?

    System.Text.RegularExpressions.Regex bodyRegex = new System.Text.RegularExpressions.Regex(@"(<h1[^>]*>[\u0000-\uFFFF]+?</h1>)");
System.Text.RegularExpressions.Match bodyMatch = bodyRegex.Match(line);
        if (bodyMatch.Success)
          {
           FileContent = bodyMatch.Result("$0");
           FileContent = (FileContent.Replace(@"<h1>", "")).Replace(@"</h1>", "");
}

By the use of this you can find the first h1 tag value

Give it a shot

String h1Regex = "<h1[^>]*?>(?<TagText>.*?)</h1>";

MatchCollection mc = Regex.Matches(Data, h1Regex, RegexOptions.Singleline);

foreach (Match m in mc) {
    Console.Writeline (m.Groups["TagText"].Value);
}

Why do you want to Regex, i know it is the fastest way but it got disadvantages too like : 1. It messes up the code readability,

  1. If your html file changes it would be a great pain for you to write a new regex,

Unless you absolutely have to, leave regex and go for Html parsers(like above mentioned HTMLAgilityPack).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM