简体   繁体   English

Selenium 无法在缓慢加载页面上找到特定元素

[英]Selenium not able to find particular elements on slow loading page

I am attempting to scrape the website basketball-reference and am running into an issue I can't seem to solve.我正在尝试抓取网站篮球参考,但遇到了一个我似乎无法解决的问题。 I am trying to grab the box score element for each game played.我正在尝试为每个玩过的游戏获取盒子得分元素。 This is something I was able to easily do with urlopen but b/c other portions of the site require Selenium I thought I would rewrite the entire process with Selenium这是我可以用 urlopen 轻松完成的事情,但是 b/c 网站的其他部分需要 Selenium 我想我会用 Selenium 重写整个过程

Issue seems to be that even if I wait to scrape until I to see the first element load using WebDriverWait, when I then move forward to grabbing the elements I get nothing returned.问题似乎是,即使我等到使用 WebDriverWait 看到第一个元素加载时才开始抓取,当我继续抓取元素时,我什么也得不到。

One thing I found interesting is if I did a full site print using my results from urlopen w/ something like print (uClient.read()) I would get roughly 300 more lines of html after beautifying compared to doing the same with print (driver.page_source).我发现有趣的一件事是,如果我使用来自 urlopen 的结果与 print (uClient.read()) 之类的东西进行完整的站点打印,与使用 print (驱动程序.page_source)。 Even if I put an ImplicitlyWait set for 5 minutes.即使我将 ImplicitlyWait 设置为 5 分钟。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC



driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.wait = WebDriverWait(driver, 10)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH,'//*[@id="content"]/div[3]/div[1]')))


box = driver.find_elements_by_class_name('game_summary expanded nohover')

print (box)

driver.quit()

Try the below code, it is working in my computer.试试下面的代码,它在我的电脑上工作。 Do let me know if you still face problem.如果您仍然遇到问题,请告诉我。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.wait = WebDriverWait(driver, 60)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="content"]/div[3]/div[1]')))

boxes = driver.wait.until(
    EC.presence_of_all_elements_located((By.XPATH, "//div[@class=\"game_summary expanded nohover\"]")))

print("Number of Elements Located : ", len(boxes))

for box in boxes:
    print(box.text)
    print("-----------")

driver.quit()

If it resolves your problem then please mark it as answer.如果它解决了您的问题,请将其标记为答案。 Thanks谢谢

Actually the site doesn't require selenium at all.实际上,该站点根本不需要 selenium。 All the data is there through a simple requests (it's just in the Comments of the html, would just need to parse that).所有数据都通过一个简单的请求(它只是在 html 的评论中,只需要解析它)。 Secondly, you can grab the box scores quite easily with pandas其次,您可以使用 pandas 轻松获取得分

import pandas as pd

dfs = pd.read_html('https://www.basketball-reference.com/boxscores/')

for idx, table in enumerate(dfs[:-2]):
    print (table)
    if (idx+1)%3 == 0:
        print("-----------")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM