简体   繁体   English

使用Selenium Webdriver(Python)循环链接

[英]Loop through links using Selenium Webdriver (Python)

Afternoon all. 下午全部。 Currently trying to use Selenium webdriver to loop through a list of links on a page. 目前正在尝试使用Selenium webdriver遍历页面上的链接列表。 Specifically, it's clicking a link, grabbing a line of text off said page to write to a file, going back, and clicking the next link in a list. 具体来说,它点击一个链接,从所述页面抓取一行文本以写入文件,返回,然后单击列表中的下一个链接。 The following is what I have: 以下是我所拥有的:

    def test_text_saver(self):
    driver = self.driver
    textsave = open("textsave.txt","w")
    list_of_links = driver.find_elements_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li")
    """Initializing Link Count:"""
    link_count = len(list_of_links)
    while x <= link_count:
        print x
        driver.find_element_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li["+str(x)+"]/a").click()
        text = driver.find_element_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text
        textsave.write(text+"\n\n")
        driver.implicitly_wait(5000)
        driver.back()
        x += 1
    textsave.close()

When run, it goes to the initial page, and...goes back to the main page, rather than the subpage that it's supposed to. 运行时,它会进入初始页面,然后...返回主页面,而不是它应该的子页面。 Printing x, I can see it's incrementing three times rather than one. 打印x,我可以看到它增加了三倍而不是一次。 It also crashes after that. 之后它也崩溃了。 I've checked all my xpaths and such, and also confirmed that it's getting the correct count for the number of links in the list. 我已经检查了所有的xpath等等,并且还确认它获得了列表中链接数量的正确计数。

Any input's hugely appreciated--this is really just to flex my python/automation, since I'm just getting into both. 任何输入都非常受欢迎 - 这实际上只是为了展示我的python /自动化,因为我刚刚进入两者。 Thanks in advance!! 提前致谢!!

I'm not sure if this will fix the problem, but in general it is better to use WebDriverWait rather than implicitly_wait since WebDriveWait.until will keep calling the supplied function (eg driver.find_element_by_xpath ) until the returned value is not False -ish or the timeout (eg 5000 seconds) is reached -- at which point it raises a selenium.common.execptions.TimeoutException . 我不确定这是否能解决问题,但一般情况下最好使用WebDriverWait而不是implicitly_wait因为WebDriveWait.until将继续调用提供的函数(例如driver.find_element_by_xpath ),直到返回的值不是False -ish或者达到超时(例如5000秒) - 此时它会引发selenium.common.execptions.TimeoutException

import selenium.webdriver.support.ui as UI

def test_text_saver(self):
    driver = self.driver
    wait = UI.WebDriverWait(driver, 5000)
    with open("textsave.txt","w") as textsave:
        list_of_links = driver.find_elements_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li/a")
        for link in list_of_links:  # 2
            link.click()   # 1
            text = wait.until(
                lambda driver: driver.find_element_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text)
            textsave.write(text+"\n\n")
            driver.back()
  1. After you click the link, you should wait until the linked url is loaded. 单击该链接后,您应该等到链接的URL加载。 So the call to wait.until is placed directly after link.click() 所以对wait.until的调用直接放在link.click()
  2. Instead of using 而不是使用

     while x <= link_count: ... x += 1 

    it is better to use 最好使用

     for link in list_of_links: 

    For one think, it improves readability. 有人认为,它提高了可读性。 Moreover, you really don't need to care about the number x , all you really care about is looping over the links, which is what the for-loop does. 而且,你真的不需要关心数字x ,你真正关心的是循环遍历链接,这就是for-loop所做的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM