简体   繁体   English

Selenium循环页面在python中刷新

[英]Selenium loop page refreshed in python

I have some questions related to doing the loop with Selenium in Python. 我有一些与在Python中使用Selenium循环有关的问题。 In fact, I want to iterate a list of links tracked by 'driver.find_elements_by_id' and click on them one by one, but the problem is such that each time I click on the link ('linklist' in the code), the page is refreshed so there is an error message indicating that 'Message: The element reference is stale. 实际上,我想遍历“ driver.find_elements_by_id”跟踪的链接列表,然后一一单击,但是问题是每次我单击链接(代码中的“链接列表”)时,页面刷新,因此出现错误消息,指示“消息:元素引用过时。 Either the element is no longer attached to the DOM or the page has been refreshed.' 要么元素不再附加到DOM,要么页面已刷新。”

I know that the reason is because the list of links disappeared after the click. 我知道原因是因为单击链接后链接列表消失了。 But how can I generally in Selenium iterate the list even though the page doesn't exist anymore. 但是我通常如何在Selenium中迭代该列表,即使该页面不再存在。 I used 'driver.back()' and apparently it doesn't work. 我使用了“ driver.back()”,显然它不起作用。

The error message pops up after this line in the code: 错误消息在代码中的此行之后弹出:

link.click()  

the linklist is located in this URL (I want to clink on the button Document and then download the first file after the refreshed page is displayed) ' https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001467373&type=10-K&dateb=20101231&owner=exclude&count=40 ' 链接列表位于此URL中(我想单击按钮“文档”,然后在显示刷新页面后下载第一个文件)' https://www.sec.gov/cgi-bin/browse-edgar?action= getcompany&CIK = 0001467373&type = 10-K&dateb = 20101231&owner = exclude&count = 40 '

Can someone have a look at this problem? 有人可以看看这个问题吗? Thank you! 谢谢!

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import unittest
import os
import time
from bs4 import BeautifulSoup
from selenium.webdriver.common.keys import Keys
import requests
import html2text



class LoginTest(unittest.TestCase):
 def setUp(self):


    self.driver=webdriver.Firefox()
    self.driver.get("https://www.sec.gov/edgar/searchedgar/companysearch.html")


 def test_Login(self):
    driver=self.driver

    cikID="cik"
    searchButtonID="cik_find"
    typeID="//*[@id='type']"
    priorID="prior_to"
    cik="00001467373"
    Type="10-K"
    prior="20101231"
    search2button="//*[@id='contentDiv']/div[2]/form/table/tbody/tr/td[6]/input[1]"


    documentsbuttonid="documentsbutton"
    formbuttonxpath='//a[text()="d10k.htm"]'


    cikElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_id(cikID))

    cikElement.clear()
    cikElement.send_keys(cik)


    searchButtonElement=WebDriverWait(driver,20).until(lambda driver:driver.find_element_by_id(searchButtonID))
    searchButtonElement.click()

    typeElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(typeID))
    typeElement.clear()
    typeElement.send_keys(Type)
    priorElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_id(priorID))
    priorElement.clear()
    priorElement.send_keys(prior)
    search2Element=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(search2button))
    search2Element.send_keys(Keys.SPACE)
    time.sleep(1)

    documentsButtonElement=WebDriverWait(driver,20).until(lambda driver:driver.find_element_by_id(documentsbuttonid))
    a=driver.current_url



    window_be1 = driver.window_handles[0]
    linklist=driver.find_elements_by_id(documentsbuttonid)


    with open("D:/doc2/"+"a"+".txt", mode="w",errors="ignore") as newfile:


        for link in linklist:

                link.click()            

                formElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(formbuttonxpath))
                formElement.click()
                time.sleep(1)

                t=driver.current_url

                r = requests.get(t)
                data = r.text

                newfile.write(html2text.html2text(data))

                drive.back()
                drive.back()


 def terdown(self):
    self.driver.quit()
if __name__=='__main__':
 unittest.main()

You should not use a list of web-elements, but a list of links. 您不应使用网络元素列表,而应使用链接列表。 Try something like this: 尝试这样的事情:

linklist = []
for link in driver.find_elements_by_xpath('//h4[@class="title"]/a'):
    linklist.append(link.get_attribute('href'))

And then you can iterate through list of links 然后您可以遍历链接列表

for link in linklist:
    driver.get(link)
    # do some actions on page

If you want to physically click on each link, you might need to use 如果要实际单击每个链接,则可能需要使用

for link in linklist:
    driver.find_element_by_xpath('//h4[@class="title"]/a[@href=%s]' % link).click()
    # do some actions on page

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM