简体   繁体   English

我正在尝试从网站获取图像列表,并将它们保存到硬盘但它不起作用

[英]I'm trying to get a list of images from website and also save them to hard disk but it doesn't work

I'm using HtmlAgilityPack. 我正在使用HtmlAgilityPack。

In this function the imageNodes in the foreach count is 0 在此函数中,foreach计数中的imageNodes为0

I don't understand why the list count is 0 我不明白为什么列表计数为0

The website contains many images. 该网站包含许多图像。 What I want is to get a list of the images from the site and show the list in the richTextBox1 and I also want to save all the images from the site on my hard disk. 我想要的是从网站获取图像列表并在richTextBox1显示列表,我还想将我站点上的所有图像保存在我的硬盘上。

How can I fix it ? 我该如何解决?

public void GetAllImages()
{
   // Bing Image Result for Cat, First Page
   string url = "http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";

   // For speed of dev, I use a WebClient
   WebClient client = new WebClient();
   string html = client.DownloadString(url);

   // Load the Html into the agility pack
   HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
   doc.LoadHtml(html);

   // Now, using LINQ to get all Images
   List<HtmlNode> imageNodes = null;
   imageNodes = (from HtmlNode node in doc.DocumentNode.SelectNodes("//img")
                 where node.Name == "img"
                    && node.Attributes["class"] != null
                    && node.Attributes["class"].Value.StartsWith("img_")
                 select node).ToList();

   foreach (HtmlNode node in imageNodes)
   {
      // Console.WriteLine(node.Attributes["src"].Value);
      richTextBox1.Text += node.Attributes["src"].Value + Environment.NewLine;
   }
}

As I can see the correct class of the Bing images is sg_t . 正如我所看到的Bing图像的正确类别是sg_t You can obtain those HtmlNodes with the following Linq query: 您可以使用以下Linq查询获取这些HtmlNodes

List<HtmlNode> imageNodes = doc.DocumentNode.Descendants("img")
    .Where(n=> n.Attributes["class"] != null && n.Attributes["class"].Value == "sg_t")
    .ToList();

This list should be filled with all the img with class = 'sg_t' 这个列表应该填入所有img with class = 'sg_t'

A quick look at that example page/URL in your code shows that the images you are after do not have a class type starting with "img_". 快速查看代码中的示例页面/ URL,可以看出您之后的图像没有以“img_”开头的类类型。

<img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&amp;id=db87e23954c9a0360784c0546cd1919c&amp;url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px">

I notice your code is targetting the thumnails only. 我注意到你的代码只针对thumnails。 You also want the full size image URL, which are in the anchor surrounding each thumbnail. 您还需要全尺寸图像URL,它位于每个缩略图周围的锚点中。 You will need to pull the final URL from a href that looks like this: 您需要从href中提取最终URL,如下所示:

<a href="/images/search?q=cat&amp;view=detail&amp;id=89929E55C0136232A79DF760E3859B9952E22F69&amp;first=0&amp;FORM=IDFRIR" class="sg_tc" h="ID=API.images,18.1"><img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&amp;id=db87e23954c9a0360784c0546cd1919c&amp;url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px"></a>

and decode the bit that look like: url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg 并解码看起来像这样的位: url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg

which decodes to: http://actnowtraining.files.wordpress.com/2012/02/cat.jpg 解析为: http://actnowtraining.files.wordpress.com/2012/02/cat.jpghttp://actnowtraining.files.wordpress.com/2012/02/cat.jpg

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从硬盘获取图像,调整图像大小并将其添加到列表中 <image> 快速? - How can i get images from the hard disk resize the images and add them to a list<image> fast? 尝试在网站上获取歌曲列表不起作用 - Trying to get song list on website doesn't work 如何从硬盘加载图像并在带定时器的图片框中显示它们? - How can I load images from the hard disk and display them in pictureBox with timer? 我正在尝试使用backgroundworker但它不起作用 - 为什么? - I'm trying to use backgroundworker but it doesn't work - why? 我正在尝试在鼠标 position 上创建一个实例,但它不起作用 - I'm trying to make a Instance at the mouse position but it doesn't work 如何从网站下载图像到我的硬盘? - How can I download the images from the site to my hard disk? 尝试将图像保存到硬盘会给我错误:不支持URI格式 - Trying to save images to hard disk give me error: URI formats are not supported 如何使用“Directory.getFiles”获取具有特定扩展名的磁盘上的所有文件并将其保存在列表中 - How can i get all files on disk with a specific extension using 'Directory.getFiles' and save them in a list 试图从一个类中创建一个对象列表,但它似乎不起作用 - Trying to create a list of objects from a class but it doesn't appear to work 我试图从网站上获取所有链接并将它们放在列表中,但有时我为什么会得到奇怪的链接? - Im trying to get all the links from a website and put them in a List but sometimes im getting strange links why?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM