I am trying to extract some data from a website using lxml and requests in Python. Here is the URL:
And here is my code:
from lxml import html
import requests
page = requests.get('https://www.google.com/finance/quote/HPQ:NYSE?comparison=NASDAQ%3AINTC%2CNASDAQ%3AAAPL%2CNASDAQ%3AAVGO%2CNASDAQ%3AQCOM')
tree = html.fromstring(page.content)
price = tree.xpath('//*[@id="yDmH0d"]/c-wiz/div/div[4]/div/div/main/div[2]/c-wiz/div/div[5]/div/div/div/div[1]/div[1]')
However, when I look at the price
it is empty. What am I doing wrong?
This page uses a lot of javascript to generate html content.
However if you disable javascript or just inspect the first doc that comes through in web-inspector (for more on that see my blog entry here ), you can see an easy way to access the price:
Which can be achieved with xpath //*/@data-last-price
:
from lxml import html
import requests
page = requests.get('https://www.google.com/finance/quote/HPQ:NYSE?comparison=NASDAQ%3AINTC%2CNASDAQ%3AAAPL%2CNASDAQ%3AAVGO%2CNASDAQ%3AQCOM')
tree = html.fromstring(page.content)
price = tree.xpath('//*/@data-last-price')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.