使用htmlagility的C＃抓取网址

Question

好的，所以我在此网页上有此URL列表，我想知道如何获取URL并将其添加到ArrayList？

http://www.animenewsnetwork.com/encyclopedia/anime.php?list=A

我只想要列表中的URL，请看一下它的意思。 我尝试自己进行操作，无论出于何种原因，它都会占用我需要的其他所有URL。

   http://pastebin.com/a7hJnXPP

Answer 1

使用HTML Agility Pack

using (var wc = new WebClient())
{
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(wc.DownloadString("http://www.animenewsnetwork.com/encyclopedia/anime.php?list=A"));
    var links = doc.DocumentNode.SelectSingleNode("//div[@class='lst']")
        .Descendants("a")
        .Select(x => x.Attributes["href"].Value)
        .ToArray();
}

Answer 2

如果只需要列表中的内容，则以下代码应该起作用（这是假定您已经将页面加载到HtmlDocument ）

List<string> hrefList = new List<string>(); //Make a list cause lists are cool.

foreach (HtmlNode node animePage.DocumentNode.SelectNodes("//a[contains(@href, 'id=')]"))
{
    //Append animenewsnetwork.com to the beginning of the href value and add it
    // to the list.
    hrefList.Add("http://www.animenewsnetwork.com" + node.GetAttributeValue("href", "null"));
}

//a[contains(@href, 'id=')]将此XPath分解如下：

//a选择所有<a>节点...
[contains(@href, 'id=')] href [contains(@href, 'id=')] ...包含包含文本id=的href属性。

那应该足以使您前进。

顺便说一句，考虑到该页面上大约有500个链接，我建议不要在其自己的消息框中列出每个链接。 500个链接= 500个消息框:(

使用htmlagility的C＃抓取网址

问题描述

2 个解决方案

解决方案1
0 2012-08-19 08:03:28

解决方案2
0 2012-08-19 08:36:00

使用htmlagility的C＃抓取网址

问题描述

2 个解决方案

解决方案1 0 2012-08-19 08:03:28

解决方案2 0 2012-08-19 08:36:00

解决方案1
0 2012-08-19 08:03:28

解决方案2
0 2012-08-19 08:36:00