简体   繁体   中英

Finding element in Selenium + Python with repeating classes

I'm using Selenium + Python to scrape match results on a Battlefy page for later manipulation and entering into a database. I'm trying to scrape the names of the teams and the results using Selenium because the dynamically loading JS requires me to use a headless browser. However, I'm trying to get the text of each college using the class name, but using Selenium's find_elements_by_class_name method doesn't seem to be working.

Web page: https://battlefy.com/college-league-of-legends/2020-north-conference/5de98dd4196d1311d9e6edbd/stage/5e23b6e395e72856dac06997/bracket/1

Current code:

>>> chrome_path = r"C:\Users\...\chromedriver.exe"
>>> driver = webdriver.Chrome(chrome_path)
>>> driver.get("https://battlefy.com/college-league-of-legends/2020-north-conference/5de98dd4196d1311d9e6edbd/stage/5e23b6e395e72856dac06997/bracket/1")
>>> team = driver.find_elements_by_class_name("team-name overflow-ellipsis float-right")
>>> for item in teams:
    print(item.text)

Which does not print anything and returns an empty array. I must be doing something incorrectly. How can I scrape each team name's text when it's covered by a class name?

team-name overflow-ellipsis float-right is combination of classes and when you use find_elements_by_class_name / find_element_by_class_name method, the locator will be converted to CSS internally but selenium library. Hence you have to mask all the spaces (white spaces) with . .

Try with below.

team = driver.find_elements_by_class_name("team-name.overflow-ellipsis.float-right")

Edit 1:

Here is the selenium implementation, where we can see the locator is pre-pended with . and it uses By.CSS_SELECTOR internally. So, we don't have to add . for the first class name.

在此处输入图片说明

To scrape the names of the teams using Selenium and Python you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies :

  • Using CSS_SELECTOR :

     driver.get("https://battlefy.com/college-league-of-legends/2020-north-conference/5de98dd4196d1311d9e6edbd/stage/5e23b6e395e72856dac06997/bracket/1") print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".team-name.overflow-ellipsis.float-right")))])
  • Using XPATH :

     driver.get("https://battlefy.com/college-league-of-legends/2020-north-conference/5de98dd4196d1311d9e6edbd/stage/5e23b6e395e72856dac06997/bracket/1") print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='team-name overflow-ellipsis float-right']")))])
  • Console Output:

     ['Cougars', 'University of Illinois at Urbana-Champaign', 'Maryville Esports', 'Michigan State University', 'Purdue University', 'Illinois Wesleyan Titans', 'UMN Varsity Gold', 'UC LoL A Team', 'Arbor Esports', 'CWRU 300 Spartans', 'Bethany Esports', 'BGC at OSU', 'University of Wisconsin', 'CGC UIC', 'Indiana University - Purdue University Indianapolis - High Tempo Gaming', 'Missouri State University', 'KSU Wildcats', 'University of Manitoba Bisons', 'Nebraska', 'S&T eSports', 'Illinois State University - Redbird Esports', 'WUSTL Bears', 'University of Iowa A Team', 'TSUES', 'Division 2+', 'Grizzlies', 'Principia College esports', 'Northwestern Varsity', 'Wright State University - Raiders', 'Milwaukee School of Engineering - Raiders', 'UPIKE Esports', 'UMDads', 'Jayhawk Esports', 'NKU Esports', 'Warriors', 'Spartans', 'ND Lol', 'SDSU Team Alpha', 'Rose-Hulman', 'SIUe eSports', 'UND', 'MTU GOLD', 'Polar Bears', 'Purdue Fort Wayne Esports', 'CSU LOL', 'Aquinas Esports', 'Shawnee State Bears', 'Lewis Flyers', 'NDSU League of Legends Club', 'South Dakota Mines - Hardrockers', 'GVSU Laker Legends', 'G&E Club @ Iowa State University', 'MVC Vikings', 'Match from North (Dukes)']
  • Note : You have to add the following imports :

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM