[英]How to use selenium to scrape from multiple page clicks
I am writing a selenium script that will get all filenames in every directory on a website.我正在编写一个 selenium 脚本,它将获取网站上每个目录中的所有文件名。 My approach to that is to make a list of directory objects and.click() every directory in the list one by one to access all filenames.
我的方法是制作一个目录对象列表,然后逐一访问列表中的每个目录。 click() 以访问所有文件名。 The problem I face is Selenium does not allow me to click on the next directory after the 1st.
我面临的问题是 Selenium 不允许我点击第一个之后的下一个目录。 The following code is my approach...
以下代码是我的方法...
folders = driver.find_elements_by_class_name("directory")
for folder in folders:
folder.click()
time.sleep(2)
# click below is to navigate back to root directory
driver.find_element_by_xpath('//*[@id="default-layout"]/div[1]/div/div/div[1]/div[1]/nav/ol/a').click()
time.sleep(2)
With the above code, I get the following error when Selenium tries to click on the 2nd directory in the list... selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document使用上面的代码,当 Selenium 尝试单击列表中的第二个目录时,我收到以下错误... selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attach to the page document
By opening the link and going to another page the previously collected links become stale .通过打开链接并转到另一个页面,以前收集的链接变得陈旧。
To overcome this problem you will need to collect the folders
list again each time you are coming from the opened page back to the main page.为了克服这个问题,每次从打开的页面返回主页时,您都需要再次收集
folders
列表。
So, your code can be as following:因此,您的代码可以如下所示:
folders = driver.find_elements_by_class_name("directory")
for index, folder in enumerate(folders):
folders[index].click()
#do what you need to do on the opened page
#then get pack to the main page
#and collect the `folders` list again with
time.sleep(2)
folders = driver.find_elements_by_class_name("directory")
This means that the element you are trying to click isn't in the DOM anymore.这意味着您尝试单击的元素不再位于 DOM 中。
A possible solution is to use WebDriverWait
something like this:一个可能的解决方案是使用
WebDriverWait
像这样:
from selenium.webdriver.support.ui import WebDriverWait
secs = 2
def waitUntilFound(driver):
element = driver.find_element_by_xpath('//*[@id="default-layout"]/div[1]/div/div/div[1]/div[1]/nav/ol/a')
if element:
return element
else:
return False
element = WebDriverWait(driver, secs).until(waitUntilFound)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.