简体   繁体   中英

How to scrape the yahoo finance search auto suggestion result with selenium python?

I am trying to use selenium python to auto search on yahoo finance. When I type some words a suggestion will pop out like the same on google suggestion.

https://finance.yahoo.com/

I found a list element with the xpath should be the suggestions made by yahoo:

//*[@id="search-assist-input"]/div[2]/ul

It seems the suggestion content are hidden in this list, but it's invisible, I mean when I click to unfold it, it just goes away. I don't know if there's some sort of 'always unfold nodes' in firefox or chrome, but these elements seem very hard to reach. I tried to get all the children under this element, it shows no element can be found:

from chrome_driver.chrome import Chrome

driver = Chrome().get_driver()
driver.get('https://finance.yahoo.com/')
driver.find_elements_by_xpath("//div[@id='search-assist-input']/div/input")[0].send_keys('goog')
x = driver.find_elements_by_xpath("//div[@data-reactid='56']/ul[@data-reactid='57']/*")

How can I reach these auto suggestion from the search box?

To extract the Auto Suggestions with respect to the search text eg GOOG within the Search Box of https://finance.yahoo.com/ you have to induce WebDriverWait for the auto suggestions to be visible and you can use the following solution :

  • Code Block :

     from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC options = Options() options.add_argument("start-maximized") options.add_argument("disable-infobars") options.add_argument("--disable-extensions") driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\\WebDrivers\\ChromeDriver\\chromedriver_win32\\chromedriver.exe') driver.get('https://finance.yahoo.com/') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='p']"))).send_keys("goog") yahoo_fin_auto_suggestions = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//input[@name='p']//following::div[1]/ul//li"))) for item in yahoo_fin_auto_suggestions : print(item.text)
  • Console Output :

     GOOG Alphabet Inc.Equity - NASDAQ GOOGL Alphabet Inc.Equity - NASDAQ GOOGL-USD.SW AlphabetEquity - Swiss GOOGL180518C01080000 GOOGL May 2018 call 1080.000Option - OPR GOOG.MX Alphabet Inc.Equity - Mexico GOOG180525C01075000 GOOG May 2018 call 1075.000Option - OPR GOOG180518C00720000 GOOG May 2018 call 720.000Option - OPR GOOGL180518C01120000 GOOGL May 2018 call 1120.000Option - OPR GOOGL.MX Alphabet Inc.Equity - Mexico GOOGL190621C01500000 GOOGL Jun 2019 call 1500.000Option - OPR

Since the source code of https://finance.yahoo.com/ website might has been changed, I have adjusted the answer of @DebanjanB in three points:

  1. Click for accepting cookies / submit consent
  2. Xpath for search field (at least for Germany/EU)
  3. Xpath for suggestion list
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
#options.add_argument('headless') #optional for headless driver

driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Program Files (x86)\Google\Chrome\Chromedriver\chromedriver.exe')
driver.get('https://finance.yahoo.com/')
driver.find_element_by_xpath("//button[@type='submit' and @value='agree']").click() #for cookie consent

WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='yfin-usr-qry']"))).send_keys("goog")
yahoo_fin_auto_suggestions = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '(//div[@class="_0ea0377c _4343c2a0 _50f34a35"])')))
for item in yahoo_fin_auto_suggestions:
    print(item.text)

Below please find a revised version in response to the latest changes on Yahoo Finance.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")

options.page_load_strategy = 'eager'
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
options.add_argument('log-level=3')
latest_news = ['Go to Latest News']

chrome_path = "C:\Python\SYS\chromedriver.exe"
driver = webdriver.Chrome(chrome_options=options, executable_path=chrome_path)
driver.get('https://finance.yahoo.com/')

WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='yfin-usr-qry']"))).send_keys("goog")
WebDriverWait(driver, 20).until(EC.text_to_be_present_in_element((By.XPATH,'//*[@id="header-search-form"]/div[2]/div[1]/div/div[1]/h3'),'Symbols'))

yahoo_fin_auto_suggestions = driver.find_elements(By.CLASS_NAME,'modules_list__1zFHY')[0].text.split('\n')
if yahoo_fin_auto_suggestions == latest_news:
    yahoo_fin_auto_suggestions = driver.find_elements(By.CLASS_NAME,'modules_list__1zFHY')[1].text.split('\n')


print(yahoo_fin_auto_suggestions)

driver.quit()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM