简体   繁体   中英

How to scrape the actual data from the website in headless mode chrome python

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys

opts = Options()
opts.set_headless()
assert opts.headless  # Operating in headless mode
browser = Chrome(executable_path=r"C:\Users\taksh\AppData\Local\Programs\Python\Python37-32\chromedriver.exe", options=opts)
browser.implicitly_wait(3)
browser.get('https://ca.finance.yahoo.com/quote/AMZN/profile?p=AMZN')

results = browser.find_elements_by_xpath('//*[@id="quote-header-info"]/div[3]/div/div/span[1]')
print(results)

And I get back:

[<selenium.webdriver.remote.webelement.WebElement (session="b3f4e2760ffec62836828e62530f082e", element="3e2741ee-8e7e-4181-9b76-e3a731cefecf")>]

What I actually what selenium to scrape is the price of the stock. I thought i was doing it correctly because this would find the element when I used selenium on Chrome without headless mode. How can I scrape the actual data from the website in headless mode?

You need to further extract the data after getting all element in a list.

results = browser.find_elements_by_xpath('//*[@id="quote-header-info"]/div[3]/div/div/span[1]')

for result in results:
    print(result.text)

This will display all the data present in list.

It could be same xpath and locator appearing multiple time in html. So if we can put this code in try-catch while checking in headless mode.

Headless mode basically will scan HTML only so to debug better Try - differnt version of xpath like going to its parent of span and then traversing it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM