Web Scraping with c# and HTMLAgilityPack

Question

Screenshot of the code and error message+variable values So, the goal is to take a word and get the part of speech of the word from its google definition.

I've tried a few different approaches but I'm getting a null reference error every time. Is my code failing to access the webpage? Is it a firewall issue, a logic issue, an {insert-issue-here} problem? I really wish i had a vague idea of what is wrong.

Thanks for your time.

Addendum: I've tried " // [@id=\\"source - luna\\"]//div " and " // [@id=\\"source - luna\\"]/div 1 " as XPath values.

 //attempt 1//////////////////////////////////////////////////////////////////////// var term = "Hello"; HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.urbandictionary.com/define.php?term=" + term); HttpWebResponse response = (HttpWebResponse)request.GetResponse(); StreamReader stream = new StreamReader(response.GetResponseStream()); string final_response = stream.ReadToEnd(); MessageBox.Show(final_response); //doesn't execute //attempt 2//////////////////////////////////////////////////////////////////////// var url = "https://www.google.co.za/search?q=define+position"; var content = new System.Net.WebClient().DownloadString(url); var webGet = new HtmlWeb(); var doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(content); //doc is null at runtime HtmlNode ourNode = doc.DocumentNode.SelectSingleNode("//*[@id=\\"uid_0\\"]/div[1]/div/div[1]/div[2]/div[2]/div[1]/i/span"); if (ourNode != null) { richTextBox1.AppendText(ourNode.InnerText); } else richTextBox1.AppendText("null"); //attempt 3//////////////////////////////////////////////////////////////////////// var webGet = new HtmlWeb(); var doc = webGet.Load("https://www.google.co.za/search?q=define+position"); //doc is null at runtime HtmlNode ourNode = doc.DocumentNode.SelectSingleNode("//*[@id=\\"uid_0\\"]/div[1]/div/div[1]/div[2]/div[2]/div[1]/i/span"); if (ourNode != null) { richTextBox1.AppendText(ourNode.InnerText); } else richTextBox1.AppendText("null"); //attempt 4//////////////////////////////////////////////////////////////////////// string Url = "http://www.metacritic.com/game/pc/halo-spartan-assault"; HtmlWeb web = new HtmlWeb(); HtmlAgilityPack.HtmlDocument doc = web.Load(Url); //doc is null at runtime string metascore = doc.DocumentNode.SelectNodes("//*[@id=\\"main\\"]/div[3]/div/div[2]/div[1]/div[1]/div/div/div[2]/a/span[1]")[0].InnerText; string userscore = doc.DocumentNode.SelectNodes("//*[@id=\\"main\\"]/div[3]/div/div[2]/div[1]/div[2]/div[1]/div/div[2]/a/span[1]")[0].InnerText; string summary = doc.DocumentNode.SelectNodes("//*[@id=\\"main\\"]/div[3]/div/div[2]/div[2]/div[1]/ul/li/span[2]/span/span[1]")[0].InnerText; richTextBox1.AppendText(metascore + " " + userscore + " " + summary); //attempt 5//////////////////////////////////////////////////////////////////////// HtmlWeb web = new HtmlWeb(); HtmlAgilityPack.HtmlDocument html = web.Load("https://www.google.co.za/search?q=define+position"); //html is null var div = html.DocumentNode.SelectNodes("//*[@id=\\"uid_0\\"]/div[1]/div/div[1]/div[2]/div[2]/div[1]/i/span"); richTextBox1.AppendText(Convert.ToString(div));

Answer 1

You are getting null because your XPATHs aren't correct or it couldn't find any node based on those XPATHs. What are you trying to achieve here?

Web Scraping with c# and HTMLAgilityPack

Question

1 answers

solution1
0 2017-02-21 18:34:15

Web Scraping with c# and HTMLAgilityPack

Question

1 answers

solution1 0 2017-02-21 18:34:15

solution1
0 2017-02-21 18:34:15