简体   繁体   中英

extracting data using lxml and request and xpath in Python from a website

I am trying to extract some data from a website using lxml and requests in Python. Here is the URL:

https://www.google.com/finance/quote/HPQ:NYSE?comparison=NASDAQ%3AINTC%2CNASDAQ%3AAAPL%2CNASDAQ%3AAVGO%2CNASDAQ%3AQCOM

And here is my code:

from lxml import html
import requests

page = requests.get('https://www.google.com/finance/quote/HPQ:NYSE?comparison=NASDAQ%3AINTC%2CNASDAQ%3AAAPL%2CNASDAQ%3AAVGO%2CNASDAQ%3AQCOM')
tree = html.fromstring(page.content)
price = tree.xpath('//*[@id="yDmH0d"]/c-wiz/div/div[4]/div/div/main/div[2]/c-wiz/div/div[5]/div/div/div/div[1]/div[1]')

However, when I look at the price it is empty. What am I doing wrong?

This page uses a lot of javascript to generate html content.

However if you disable javascript or just inspect the first doc that comes through in web-inspector (for more on that see my blog entry here ), you can see an easy way to access the price:

在此处输入图片说明

Which can be achieved with xpath //*/@data-last-price :

from lxml import html
import requests

page = requests.get('https://www.google.com/finance/quote/HPQ:NYSE?comparison=NASDAQ%3AINTC%2CNASDAQ%3AAAPL%2CNASDAQ%3AAVGO%2CNASDAQ%3AQCOM')
tree = html.fromstring(page.content)
price = tree.xpath('//*/@data-last-price')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM