简体   繁体   中英

How to scrape a website with an ul-li dropdown in Python?

Based on the question Scraping a specific website with a search box and javascripts in Python , I'm trying to scrape company ratings from the website https://www.msci.com/esg-ratings/ Mainly, entering a company name in the search box, choosing all options for that name in the dropout menu ("RIO TINTO LIMITED" and "RIO TINTO PLC" here for "rio tinto") and getting the picture with the rating located on the top right corner for both.

However, I have troubles handling the ul-li dropout menu with suggested companies:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()
options.add_argument('-headless')
options.add_argument('-no-sandbox')
options.add_argument('-disable-dev-shm-usage')
options.add_argument('window-size=1920,1080')

wd = webdriver.Chrome(options=options)
wd.get('https://www.msci.com/esg-ratings')

WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="_esgratingsprofile_keywords"]'))).send_keys("RIO TINTO")
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="ui-id-1"]/li[1]'))).click()
#WebDriverWait(wd,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"#_esgratingsprofile_esg-ratings-profile-header > div.esg-ratings-profile-header-ratingdata > div.ratingdata-container > div.ratingdata-outercircle.esgratings-profile-header-yellow > div")))
print(wd.find_element_by_xpath('//*[@id="_esgratingsprofile_esg-ratings-profile-header"]/div[2]/div[1]/div[2]/div'))

(The code gives the ElementClickInterceptedException.)

How to access the needed data for both "RIO TINTO LIMITED" and "RIO TINTO PLC"?

I have troubles handling the ul-li dropout menu with suggested companies

This is expected since the element that you are targetting is rendered via a dynamic script. You will have to avoid options.add_argument('-headless') in order to overcomethis.

You also have a problem here

print(wd.find_element_by_xpath('//*[@id="_esgratingsprofile_esg-ratings-profile-header"]/div[2]/div[1]/div[2]/div'))

where you attempt to print the element. Since the target element is a rendered icon by CSS , you cannot use print() to output that. Instead you need to save it as a, eg a .png file

with open('filename.png', 'wb') as file:
    file.write(driver.find_element_by_xpath('//*[@id="_esgratingsprofile_esg-ratings-profile-header"]/div[2]/div[1]/div[2]/div').screenshot_as_png)

And then use it for your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM