简体   繁体   中英

Selenium not able to find particular elements on slow loading page

I am attempting to scrape the website basketball-reference and am running into an issue I can't seem to solve. I am trying to grab the box score element for each game played. This is something I was able to easily do with urlopen but b/c other portions of the site require Selenium I thought I would rewrite the entire process with Selenium

Issue seems to be that even if I wait to scrape until I to see the first element load using WebDriverWait, when I then move forward to grabbing the elements I get nothing returned.

One thing I found interesting is if I did a full site print using my results from urlopen w/ something like print (uClient.read()) I would get roughly 300 more lines of html after beautifying compared to doing the same with print (driver.page_source). Even if I put an ImplicitlyWait set for 5 minutes.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC



driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.wait = WebDriverWait(driver, 10)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH,'//*[@id="content"]/div[3]/div[1]')))


box = driver.find_elements_by_class_name('game_summary expanded nohover')

print (box)

driver.quit()

Try the below code, it is working in my computer. Do let me know if you still face problem.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.wait = WebDriverWait(driver, 60)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="content"]/div[3]/div[1]')))

boxes = driver.wait.until(
    EC.presence_of_all_elements_located((By.XPATH, "//div[@class=\"game_summary expanded nohover\"]")))

print("Number of Elements Located : ", len(boxes))

for box in boxes:
    print(box.text)
    print("-----------")

driver.quit()

If it resolves your problem then please mark it as answer. Thanks

Actually the site doesn't require selenium at all. All the data is there through a simple requests (it's just in the Comments of the html, would just need to parse that). Secondly, you can grab the box scores quite easily with pandas

import pandas as pd

dfs = pd.read_html('https://www.basketball-reference.com/boxscores/')

for idx, table in enumerate(dfs[:-2]):
    print (table)
    if (idx+1)%3 == 0:
        print("-----------")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM