简体   繁体   中英

A regular expression for anchor html tag in C#?

I need a regular expression in C# for anchor tag in html source codes as general as it's possible. Consider this html code:

<a id="[constant]"
      href="[specific]"
    >GlobalPlatform Card Specification 2.2
    March, 2006</a>

By [constant] I mean the value is a constant string so there is no problem with it. By [specific] I mean the address is a simple and specific string so the regular expression for it, is simple. The main problem is that I can not handle the newline character in the middle of title of anchor tag. I wrote this regular expression previously that works well except handling the newline character between title of anchor tag.

<a[\\s\\n\\r]+id=\"[constant]"[\\s\\n\\r]+href=\"[specific]"[\\s\\n\\r]*>[\\s\\n\\r]*[^\\n\\r]+[\\s\\n\\r]*</a>

Please help me

You should stay away from regular expressions when it comes to parse HTML and use an HTML parser like the HTML Agility Pack .

And to help you get started check how simple it can be to parse that single anchor tag.

HtmlDocument doc = new HtmlDocument();

doc.LoadHtml(@"<a id=""[constant]""
      href=""[specific]""
    >GlobalPlatform Card Specification 2.2
    March, 2006</a>
");

var anchor = doc.DocumentNode.Element("a");

Console.WriteLine(anchor.Id);
Console.WriteLine(anchor.Attributes["href"].Value);

Beats regular expressions, don't you think? :)

if you are using C# you can define option multiline while creating Regex,

Regex r = new Regex(pattern, RegexOptions.Multiline);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM