简体   繁体   English

如何使用 selenium 从多个页面点击中抓取

[英]How to use selenium to scrape from multiple page clicks

I am writing a selenium script that will get all filenames in every directory on a website.我正在编写一个 selenium 脚本,它将获取网站上每个目录中的所有文件名。 My approach to that is to make a list of directory objects and.click() every directory in the list one by one to access all filenames.我的方法是制作一个目录对象列表,然后逐一访问列表中的每个目录。 click() 以访问所有文件名。 The problem I face is Selenium does not allow me to click on the next directory after the 1st.我面临的问题是 Selenium 不允许我点击第一个之后的下一个目录。 The following code is my approach...以下代码是我的方法...

folders = driver.find_elements_by_class_name("directory")

for folder in folders:
    folder.click()
    time.sleep(2)
    # click below is to navigate back to root directory
    driver.find_element_by_xpath('//*[@id="default-layout"]/div[1]/div/div/div[1]/div[1]/nav/ol/a').click()
    time.sleep(2)

With the above code, I get the following error when Selenium tries to click on the 2nd directory in the list... selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document使用上面的代码,当 Selenium 尝试单击列表中的第二个目录时,我收到以下错误... selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attach to the page document

By opening the link and going to another page the previously collected links become stale .通过打开链接并转到另一个页面,以前收集的链接变得陈旧
To overcome this problem you will need to collect the folders list again each time you are coming from the opened page back to the main page.为了克服这个问题,每次从打开的页面返回主页时,您都需要再次收集folders列表。
So, your code can be as following:因此,您的代码可以如下所示:

folders = driver.find_elements_by_class_name("directory")

for index, folder in enumerate(folders):
    folders[index].click()
    #do what you need to do on the opened page
    #then get pack to the main page
    #and collect the `folders` list again with
    time.sleep(2)
    folders = driver.find_elements_by_class_name("directory")

This means that the element you are trying to click isn't in the DOM anymore.这意味着您尝试单击的元素不再位于 DOM 中。

A possible solution is to use WebDriverWait something like this:一个可能的解决方案是使用WebDriverWait像这样:

from selenium.webdriver.support.ui import WebDriverWait
secs = 2
def waitUntilFound(driver):
    element = driver.find_element_by_xpath('//*[@id="default-layout"]/div[1]/div/div/div[1]/div[1]/nav/ol/a')
    if element:
        return element
    else:
        return False
element = WebDriverWait(driver, secs).until(waitUntilFound)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 selenium 从一页抓取多个网页? - How to scrape multiple webpages stemming from one page using selenium? Python 使用 Selenium 从页面上的多个链接中抓取数据 - Python Using Selenium to scrape data from multiple links on a page Python 网页抓取 | 如何通过选择页码作为使用 Beautiful Soup 和 selenium 的范围从多个 url 中抓取数据? - Python Web Scraping | How to scrape data from multiple urls by choosing page number as a range with Beautiful Soup and selenium? 如何在 Python 中使用 Selenium 从 github 页面 Z572D4E421E5E6B9BC118A 中刮取贡献者姓名 - How to use Selenium in Python to scrape contributors names from github page url 如何使用 Selenium 抓取多个 URL 的内容? Python - How to use Selenium to scrape multiple URLs' contents? Python 如何使用Selenium抓取页面中的所有数据? - How to scrape all the data from a page using Selenium? 如何使用 Selenium 从 LinkedIn 公司页面中抓取员工数量? - How to scrape employee counts from a LinkedIn company page using Selenium? 如何使用 Selenium 和 Python 从 Linkedin 页面抓取嵌套数据 - How to scrape the nested data from Linkedin page using Selenium and Python 只刮一页我想用硒刮多页 - Scrape only 1 page I want to scrape multiple pages with selenium 使用 Python Selenium 从页面递归刮取表格 - Recursively scrape table from page with Python Selenium
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM