我正在尝试从网站获取图像列表，并将它们保存到硬盘但它不起作用

Question

I'm using HtmlAgilityPack. 我正在使用HtmlAgilityPack。

In this function the imageNodes in the foreach count is 0 在此函数中，foreach计数中的imageNodes为0

I don't understand why the list count is 0 我不明白为什么列表计数为0

The website contains many images. 该网站包含许多图像。 What I want is to get a list of the images from the site and show the list in the richTextBox1 and I also want to save all the images from the site on my hard disk. 我想要的是从网站获取图像列表并在richTextBox1显示列表，我还想将我站点上的所有图像保存在我的硬盘上。

How can I fix it ? 我该如何解决？

public void GetAllImages()
{
   // Bing Image Result for Cat, First Page
   string url = "http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";

   // For speed of dev, I use a WebClient
   WebClient client = new WebClient();
   string html = client.DownloadString(url);

   // Load the Html into the agility pack
   HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
   doc.LoadHtml(html);

   // Now, using LINQ to get all Images
   List<HtmlNode> imageNodes = null;
   imageNodes = (from HtmlNode node in doc.DocumentNode.SelectNodes("//img")
                 where node.Name == "img"
                    && node.Attributes["class"] != null
                    && node.Attributes["class"].Value.StartsWith("img_")
                 select node).ToList();

   foreach (HtmlNode node in imageNodes)
   {
      // Console.WriteLine(node.Attributes["src"].Value);
      richTextBox1.Text += node.Attributes["src"].Value + Environment.NewLine;
   }
}

Answer 1

As I can see the correct class of the Bing images is sg_t . 正如我所看到的Bing图像的正确类别是sg_t 。 You can obtain those HtmlNodes with the following Linq query: 您可以使用以下Linq查询获取这些HtmlNodes ：

List<HtmlNode> imageNodes = doc.DocumentNode.Descendants("img")
    .Where(n=> n.Attributes["class"] != null && n.Attributes["class"].Value == "sg_t")
    .ToList();

This list should be filled with all the img with class = 'sg_t' 这个列表应该填入所有img with class = 'sg_t'

Answer 2

A quick look at that example page/URL in your code shows that the images you are after do not have a class type starting with "img_". 快速查看代码中的示例页面/ URL，可以看出您之后的图像没有以“img_”开头的类类型。

<img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&amp;id=db87e23954c9a0360784c0546cd1919c&amp;url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px">

I notice your code is targetting the thumnails only. 我注意到你的代码只针对thumnails。 You also want the full size image URL, which are in the anchor surrounding each thumbnail. 您还需要全尺寸图像URL，它位于每个缩略图周围的锚点中。 You will need to pull the final URL from a href that looks like this: 您需要从href中提取最终URL，如下所示：

<a href="/images/search?q=cat&amp;view=detail&amp;id=89929E55C0136232A79DF760E3859B9952E22F69&amp;first=0&amp;FORM=IDFRIR" class="sg_tc" h="ID=API.images,18.1"><img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&amp;id=db87e23954c9a0360784c0546cd1919c&amp;url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px"></a>

and decode the bit that look like: url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg 并解码看起来像这样的位： url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg

which decodes to: http://actnowtraining.files.wordpress.com/2012/02/cat.jpg 解析为： http://actnowtraining.files.wordpress.com/2012/02/cat.jpg ： http://actnowtraining.files.wordpress.com/2012/02/cat.jpg

我正在尝试从网站获取图像列表，并将它们保存到硬盘但它不起作用

问题描述

2 个解决方案

解决方案1
2 已采纳 2012-05-15 10:23:03

解决方案2
0 2012-05-15 10:25:01

我正在尝试从网站获取图像列表，并将它们保存到硬盘但它不起作用

问题描述

2 个解决方案

解决方案1 2 已采纳 2012-05-15 10:23:03

解决方案2 0 2012-05-15 10:25:01

解决方案1
2 已采纳 2012-05-15 10:23:03

解决方案2
0 2012-05-15 10:25:01