I have asp.net mvc application and html page which I parse using HtmlAgilityPack, but when I try looping my elements I have next error in my foreach: Object reference not set to an instance of an object
. My code is next. Does anybody know where is my mistake? I'm new with using htmlagilitypack.
Part of HTML:
<li class="b-serp-item i-bem" onclick="return {"b-serp-item":{}}">
<i class="b-serp-item__favicon" style="background-position: 0 -0px"></i>
<h2 class="b-serp-item__title">
<b class="b-serp-item__number">1</b>
<a class="b-serp-item__title-link" href="http://googlescraping.com/google-scraper.php">Google</a>
</h2>
</li>
CODE
DateTime dt = DateTime.Now;
string dtf = String.Format("{0:u}", dt);
string wp = "page" + dtf + ".html";
HtmlDocument HD = new HtmlDocument();
HD.Load(wp);
string output = "";
foreach (HtmlNode node in HD.DocumentNode.SelectNodes("//a[@class='b-serp-item__title-link']"))
{
output += node.GetAttributeValue("href", null) + " ";
}
Html output I was shared in google drive: https://drive.google.com/file/d/0B3-m-r5Ce0gOSTlzUGlTT1VBb00/edit?usp=sharing
I ran your code with one slight change, I used HtmlDocument.LoadHtml(stringContents)
instead of HtmlDocument.Load(path)
and then it works flawlessly.
I suspect that the code is unable to find the file from the path. Ensure that the file exists using File.Exists(wp)
and consider using a fully qualified path instead of just the file name by using wp = Path.GetFullPath(wp)
.
Or read the contents first using string contents = File.ReadAllText(wp);
to grab the contents and then use the LoadHtml
method on the HtmlDocument
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.