I'm using HtmlAgilityPack.
In this function the imageNodes
in the foreach count is 0
I don't understand why the list count is 0
The website contains many images. What I want is to get a list of the images from the site and show the list in the richTextBox1
and I also want to save all the images from the site on my hard disk.
How can I fix it ?
public void GetAllImages()
{
// Bing Image Result for Cat, First Page
string url = "http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";
// For speed of dev, I use a WebClient
WebClient client = new WebClient();
string html = client.DownloadString(url);
// Load the Html into the agility pack
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
// Now, using LINQ to get all Images
List<HtmlNode> imageNodes = null;
imageNodes = (from HtmlNode node in doc.DocumentNode.SelectNodes("//img")
where node.Name == "img"
&& node.Attributes["class"] != null
&& node.Attributes["class"].Value.StartsWith("img_")
select node).ToList();
foreach (HtmlNode node in imageNodes)
{
// Console.WriteLine(node.Attributes["src"].Value);
richTextBox1.Text += node.Attributes["src"].Value + Environment.NewLine;
}
}
As I can see the correct class of the Bing images is sg_t
. You can obtain those HtmlNodes
with the following Linq query:
List<HtmlNode> imageNodes = doc.DocumentNode.Descendants("img")
.Where(n=> n.Attributes["class"] != null && n.Attributes["class"].Value == "sg_t")
.ToList();
This list should be filled with all the img
with class = 'sg_t'
A quick look at that example page/URL in your code shows that the images you are after do not have a class type starting with "img_".
<img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&id=db87e23954c9a0360784c0546cd1919c&url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px">
I notice your code is targetting the thumnails only. You also want the full size image URL, which are in the anchor surrounding each thumbnail. You will need to pull the final URL from a href that looks like this:
<a href="/images/search?q=cat&view=detail&id=89929E55C0136232A79DF760E3859B9952E22F69&first=0&FORM=IDFRIR" class="sg_tc" h="ID=API.images,18.1"><img class="sg_t" src="http://ts2.mm.bing.net/images/thumbnail.aspx?q=4588327016989297&id=db87e23954c9a0360784c0546cd1919c&url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg" style="height:133px;top:2px"></a>
and decode the bit that look like: url=http%3a%2f%2factnowtraining.files.wordpress.com%2f2012%2f02%2fcat.jpg
which decodes to: http://actnowtraining.files.wordpress.com/2012/02/cat.jpg
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.