简体   繁体   English

如何使用 Python 中的 selenium 等待该网站加载?

[英]How do I wait for this website to load using selenium in Python?

Currently I'm using beautifulSoup in my python web scraping project.目前我在我的 python web 抓取项目中使用 beautifulSoup 。 However, in one of the pages I need to scrape, I need to interact with a javascript element.但是,在我需要抓取的页面之一中,我需要与 javascript 元素进行交互。 So I'm being forced to use selenium (which I'm not that familiar with).所以我被迫使用 selenium (我不太熟悉)。 This is my code so far:到目前为止,这是我的代码:

from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
opts = Options()
opts.add_argument('--headless')
seleniumDriver = Firefox(options=opts, executable_path = 'D:\Programs\Python\Scripts\geckodriver.exe')

seleniumDriver.get("https://www.thecompleteuniversityguide.co.uk/courses/details/computing-bsc/57997898")
driverWait = WebDriverWait(seleniumDriver, 10)
driverWait.until(EC.invisibility_of_element_located((By.ID, "mainBody")))

moduleButton = seleniumDriver.find_element_by_xpath("//div[@class='mdldv']")#.find_element_by_tag_name("span")
print("MODULE BUTTON:", moduleButton)
moduleButton.click()

seleniumDriver.close()

Currently, I'm getting a timeout error, however I'm certain that the mainBody ID element does exist.目前,我收到超时错误,但我确定 mainBody ID 元素确实存在。 (I don't know how to use the By class, so I have no idea how it will work). (我不知道如何使用 By class,所以我不知道它是如何工作的)。 Error Message:错误信息:

Traceback (most recent call last):
  File "D:/Web Scraping/selenium tests.py", line 12, in <module>
    driverWait.until(EC.invisibility_of_element_located((By.ID, "mainBody")))
  File "D:\Programs\Python\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

You are calling:您正在调用:

driverWait.until(EC.invisibility_of_element_located((By.ID, "mainBody")))

Per the doc, this will wait until the element is absent:根据文档,这将等到元素不存在:

class invisibility_of_element_located(object):
    """ An Expectation for checking that an element is either invisible or not
    present on the DOM.

    locator used to find the element
    """

The Timeout exception that was raised means that the element was found but was never removed from the DOM or never became invisible.引发的 Timeout 异常意味着该元素已找到但从未从 DOM 中删除或从未变得不可见。

What you need it to wait until the element is found (part of the DOM).你需要它等到找到元素(DOM的一部分)。 Use instead, presence_of_element_located改用presence_of_element_located

driverWait.until(presence_of_element_located((By.ID, "mainBody")))

The timeout exception will be raised if it is not found within the timeout requested when creating driverWait如果在创建driverWait时请求的超时时间内没有找到,则会引发timeout异常

(I don't know how to use the By class, so I have no idea how it will work) (我不知道如何使用 By class,所以我不知道它是如何工作的)

The By is used under the hood when calling find_element_by_xpath/id/css_selector. By在调用 find_element_by_xpath/id/css_selector 时在后台使用。

In your case, when you are using EC, you are giving the locator to use By.ID and its value.在您的情况下,当您使用 EC 时,您正在给定位器使用 By.ID 及其值。 You can see it equal to find_element_by_id('yourValue')你可以看到它等于find_element_by_id('yourValue')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM