Consider this simplest piece of code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
namespace WebScraper
{
class Program
{
static void Main(string[] args)
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("http://www.google.com");
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
}
}
}
}
This effectively doesnt do anything at all, and is copied/inspired from various other StackOverflow questions like this . When compiling this, there is a runtime error which says "Object reference not set to an instance of an object." highlighting the foreach line.
I can't understand, why the environment has become irritable to this humble,innocent and useless piece of code.
I would also like to know, does HTMLAgilityPack accept HTML classes as nodes?
If you want to load html from the web, you need to use the HtmlWeb
object:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc =web.Load(url);
LoadHtml
takes a string of actual HTML as an argument. You can pass Load a Stream from WebResponse.GetResponseStream()
instead.
WebRequest req = WebRequest.Create( "http://www.google.com" );
Stream s = req.GetResponse().GetResponseStream();
doc.Load(s);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.