extracting data using lxml and request and xpath in Python from a website

Question

I am trying to extract some data from a website using lxml and requests in Python. Here is the URL:

https://www.google.com/finance/quote/HPQ:NYSE?comparison=NASDAQ%3AINTC%2CNASDAQ%3AAAPL%2CNASDAQ%3AAVGO%2CNASDAQ%3AQCOM

And here is my code:

from lxml import html
import requests

page = requests.get('https://www.google.com/finance/quote/HPQ:NYSE?comparison=NASDAQ%3AINTC%2CNASDAQ%3AAAPL%2CNASDAQ%3AAVGO%2CNASDAQ%3AQCOM')
tree = html.fromstring(page.content)
price = tree.xpath('//*[@id="yDmH0d"]/c-wiz/div/div[4]/div/div/main/div[2]/c-wiz/div/div[5]/div/div/div/div[1]/div[1]')

However, when I look at the price it is empty. What am I doing wrong?

Answer 1

This page uses a lot of javascript to generate html content.

However if you disable javascript or just inspect the first doc that comes through in web-inspector (for more on that see my blog entry here ), you can see an easy way to access the price:

Which can be achieved with xpath //*/@data-last-price :

from lxml import html
import requests

page = requests.get('https://www.google.com/finance/quote/HPQ:NYSE?comparison=NASDAQ%3AINTC%2CNASDAQ%3AAAPL%2CNASDAQ%3AAVGO%2CNASDAQ%3AQCOM')
tree = html.fromstring(page.content)
price = tree.xpath('//*/@data-last-price')

extracting data using lxml and request and xpath in Python from a website

Question

1 answers

solution1
0 2021-11-04 15:37:59

extracting data using lxml and request and xpath in Python from a website

Question

1 answers

solution1 0 2021-11-04 15:37:59

solution1
0 2021-11-04 15:37:59