使用htmlagility的C＃抓取網址

Question

好的，所以我在此網頁上有此URL列表，我想知道如何獲取URL並將其添加到ArrayList？

http://www.animenewsnetwork.com/encyclopedia/anime.php?list=A

我只想要列表中的URL，請看一下它的意思。 我嘗試自己進行操作，無論出於何種原因，它都會占用我需要的其他所有URL。

   http://pastebin.com/a7hJnXPP

Answer 1

使用HTML Agility Pack

using (var wc = new WebClient())
{
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(wc.DownloadString("http://www.animenewsnetwork.com/encyclopedia/anime.php?list=A"));
    var links = doc.DocumentNode.SelectSingleNode("//div[@class='lst']")
        .Descendants("a")
        .Select(x => x.Attributes["href"].Value)
        .ToArray();
}

Answer 2

如果只需要列表中的內容，則以下代碼應該起作用（這是假定您已經將頁面加載到HtmlDocument ）

List<string> hrefList = new List<string>(); //Make a list cause lists are cool.

foreach (HtmlNode node animePage.DocumentNode.SelectNodes("//a[contains(@href, 'id=')]"))
{
    //Append animenewsnetwork.com to the beginning of the href value and add it
    // to the list.
    hrefList.Add("http://www.animenewsnetwork.com" + node.GetAttributeValue("href", "null"));
}

//a[contains(@href, 'id=')]將此XPath分解如下：

//a選擇所有<a>節點...
[contains(@href, 'id=')] href [contains(@href, 'id=')] ...包含包含文本id=的href屬性。

那應該足以使您前進。

順便說一句，考慮到該頁面上大約有500個鏈接，我建議不要在其自己的消息框中列出每個鏈接。 500個鏈接= 500個消息框:(

使用htmlagility的C＃抓取網址

問題描述

2 個解決方案

解決方案1
0 2012-08-19 08:03:28

解決方案2
0 2012-08-19 08:36:00

使用htmlagility的C＃抓取網址

問題描述

2 個解決方案

解決方案1 0 2012-08-19 08:03:28

解決方案2 0 2012-08-19 08:36:00

解決方案1
0 2012-08-19 08:03:28

解決方案2
0 2012-08-19 08:36:00