I need a regular expression in C# for anchor tag in html source codes as general as it's possible. Consider this html code:
<a id="[constant]"
href="[specific]"
>GlobalPlatform Card Specification 2.2
March, 2006</a>
By [constant] I mean the value is a constant string so there is no problem with it. By [specific] I mean the address is a simple and specific string so the regular expression for it, is simple. The main problem is that I can not handle the newline character in the middle of title of anchor tag. I wrote this regular expression previously that works well except handling the newline character between title of anchor tag.
<a[\\s\\n\\r]+id=\"[constant]"[\\s\\n\\r]+href=\"[specific]"[\\s\\n\\r]*>[\\s\\n\\r]*[^\\n\\r]+[\\s\\n\\r]*</a>
Please help me
You should stay away from regular expressions when it comes to parse HTML and use an HTML parser like the HTML Agility Pack .
And to help you get started check how simple it can be to parse that single anchor tag.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(@"<a id=""[constant]""
href=""[specific]""
>GlobalPlatform Card Specification 2.2
March, 2006</a>
");
var anchor = doc.DocumentNode.Element("a");
Console.WriteLine(anchor.Id);
Console.WriteLine(anchor.Attributes["href"].Value);
Beats regular expressions, don't you think? :)
if you are using C# you can define option multiline while creating Regex,
Regex r = new Regex(pattern, RegexOptions.Multiline);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.