Using lxml and xpath to get text from a webpage

Question

I'm trying to pull a number off of a webpage, specifically the current presidential approval rating from RealClearPolitics.

Here's the code I'm using, trying to use urllib2 to get the webpage, lxml to parse it all, and using the xpath that chrome reports. Problem is, all I get at the end is an empty list.

import urllib2
from lxml import etree

url = "http://www.realclearpolitics.com/epolls/other/president_obama_job_approval-1044.html"
page = urllib2.urlopen(url)

tree = etree.parse(page.content, etree.HTMLParser())

rcp=tree.xpath('//*[@id="polling-data-rcp"]/table/tbody/tr[2]/td[4]')

print rcp

Any help would be appreciated!

Answer 1

tr[2]/td[4] is not right. See:

So you would need to use a correct XPath query:

And the Python code would be:

import requests
from lxml import html

URL = "http://www.realclearpolitics.com/epolls/other/president_obama_job_approval-1044.html"
response = requests.get(URL)
tree = html.fromstring(response.content)

rcp_approve = '//table[@class="chart_legend small_legend"]/tbody/tr/td[@class="candidate"][1]/div[1]/span/text()'
rcp_disapprove = '//table[@class="chart_legend small_legend"]/tbody/tr/td[@class="candidate"][2]/div[1]/span/text()'

rcp_approve = float(tree.xpath(rcp_approve)[0])
rcp_disapprove = float(tree.xpath(rcp_disapprove)[0])

print "Obama's approve rate: {}".format(rcp_approve)
print "Obama's disapprove rate: {}".format(rcp_disapprove)

Output:

Obama's approve rate: 44.4
Obama's disapprove rate: 51.6

Using lxml and xpath to get text from a webpage

Question

1 answers

solution1
2 ACCPTED 2016-01-09 17:00:30

Using lxml and xpath to get text from a webpage

Question

1 answers

solution1 2 ACCPTED 2016-01-09 17:00:30

solution1
2 ACCPTED 2016-01-09 17:00:30