简体   繁体   中英

Finding href, id and classes of all links in a web page

I need to find all links in a web page and its href, id and class properties/attributes..

Even though all links would have only one href and id, it could have multiple classes and so classes will have to be captured in a list.

I found this to find all links and their href also found the htmlagilitypack

I'm not familiar with html parsing and so would appreciate if some one could help me in geting the id and classes for the links.

Any help is sincerely appreciated.

Thanks

htmlagilitypack is a great tool. You can use linq to search all 'a' tags in a web page.

let's look at this sample:

 HtmlDocument doc = new HtmlWeb().Load("http://www.google.com");

 IEnumerable<HtmlNode> linkedPages = doc.DocumentNode.Descendants("a");
 foreach (var item in linkedPages)
 {
    Console.WriteLine("Href : " + item.GetAttributeValue("href", string.Empty) +
    " id : " + item.GetAttributeValue("id", string.Empty) +
    " class : " + item.GetAttributeValue("class", string.Empty));
 }

Ludo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM