简体   繁体   中英

Null reference exception when try to get link by class in HtmlAgilityPack

I have asp.net mvc application and html page which I parse using HtmlAgilityPack, but when I try looping my elements I have next error in my foreach: Object reference not set to an instance of an object . My code is next. Does anybody know where is my mistake? I'm new with using htmlagilitypack.

Part of HTML:

<li class="b-serp-item i-bem" onclick="return {&quot;b-serp-item&quot;:{}}">
  <i class="b-serp-item__favicon" style="background-position: 0 -0px"></i>
  <h2 class="b-serp-item__title">
    <b class="b-serp-item__number">1</b>
    <a class="b-serp-item__title-link" href="http://googlescraping.com/google-scraper.php">Google</a>
  </h2>
</li>

CODE

DateTime dt = DateTime.Now;
string dtf = String.Format("{0:u}", dt);
string wp = "page" + dtf + ".html";
HtmlDocument HD = new HtmlDocument();
HD.Load(wp);
string output = "";
foreach (HtmlNode node in HD.DocumentNode.SelectNodes("//a[@class='b-serp-item__title-link']"))
{
    output += node.GetAttributeValue("href", null) + " ";
}

Html output I was shared in google drive: https://drive.google.com/file/d/0B3-m-r5Ce0gOSTlzUGlTT1VBb00/edit?usp=sharing

I ran your code with one slight change, I used HtmlDocument.LoadHtml(stringContents) instead of HtmlDocument.Load(path) and then it works flawlessly.

I suspect that the code is unable to find the file from the path. Ensure that the file exists using File.Exists(wp) and consider using a fully qualified path instead of just the file name by using wp = Path.GetFullPath(wp) .

Or read the contents first using string contents = File.ReadAllText(wp); to grab the contents and then use the LoadHtml method on the HtmlDocument .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM