[英]I'm trying to get a list of images from website and also save them to hard disk but it doesn't work
I'm using HtmlAgilityPack. 我正在使用HtmlAgilityPack。
In this function the imageNodes
in the foreach count is 0 在此函数中,foreach计数中的
imageNodes
为0
I don't understand why the list count is 0 我不明白为什么列表计数为0
The website contains many images. 该网站包含许多图像。 What I want is to get a list of the images from the site and show the list in the
richTextBox1
and I also want to save all the images from the site on my hard disk. 我想要的是从网站获取图像列表并在
richTextBox1
显示列表,我还想将我站点上的所有图像保存在我的硬盘上。
How can I fix it ? 我该如何解决?
public void GetAllImages()
{
// Bing Image Result for Cat, First Page
string url = "http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";
// For speed of dev, I use a WebClient
WebClient client = new WebClient();
string html = client.DownloadString(url);
// Load the Html into the agility pack
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
// Now, using LINQ to get all Images
List<HtmlNode> imageNodes = null;
imageNodes = (from HtmlNode node in doc.DocumentNode.SelectNodes("//img")
where node.Name == "img"
&& node.Attributes["class"] != null
&& node.Attributes["class"].Value.StartsWith("img_")
select node).ToList();
foreach (HtmlNode node in imageNodes)
{
// Console.WriteLine(node.Attributes["src"].Value);
richTextBox1.Text += node.Attributes["src"].Value + Environment.NewLine;
}
}
As I can see the correct class of the Bing images is sg_t
. 正如我所看到的Bing图像的正确类别是
sg_t
。 You can obtain those HtmlNodes
with the following Linq query: 您可以使用以下Linq查询获取这些
HtmlNodes
:
List<HtmlNode> imageNodes = doc.DocumentNode.Descendants("img")
.Where(n=> n.Attributes["class"] != null && n.Attributes["class"].Value == "sg_t")
.ToList();
This list should be filled with all the img
with class = 'sg_t'
这个列表应该填入所有
img
with class = 'sg_t'
A quick look at that example page/URL in your code shows that the images you are after do not have a class type starting with "img_". 快速查看代码中的示例页面/ URL,可以看出您之后的图像没有以“img_”开头的类类型。
<img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&id=db87e23954c9a0360784c0546cd1919c&url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px">
I notice your code is targetting the thumnails only. 我注意到你的代码只针对thumnails。 You also want the full size image URL, which are in the anchor surrounding each thumbnail.
您还需要全尺寸图像URL,它位于每个缩略图周围的锚点中。 You will need to pull the final URL from a href that looks like this:
您需要从href中提取最终URL,如下所示:
<a href="/images/search?q=cat&view=detail&id=89929E55C0136232A79DF760E3859B9952E22F69&first=0&FORM=IDFRIR" class="sg_tc" h="ID=API.images,18.1"><img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&id=db87e23954c9a0360784c0546cd1919c&url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px"></a>
and decode the bit that look like: url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg
并解码看起来像这样的位:
url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg
which decodes to: http://actnowtraining.files.wordpress.com/2012/02/cat.jpg
解析为:
http://actnowtraining.files.wordpress.com/2012/02/cat.jpg
: http://actnowtraining.files.wordpress.com/2012/02/cat.jpg
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.