[英]Loop through links using Selenium Webdriver (Python)
Afternoon all. 下午全部。 Currently trying to use Selenium webdriver to loop through a list of links on a page.
目前正在尝试使用Selenium webdriver遍历页面上的链接列表。 Specifically, it's clicking a link, grabbing a line of text off said page to write to a file, going back, and clicking the next link in a list.
具体来说,它点击一个链接,从所述页面抓取一行文本以写入文件,返回,然后单击列表中的下一个链接。 The following is what I have:
以下是我所拥有的:
def test_text_saver(self):
driver = self.driver
textsave = open("textsave.txt","w")
list_of_links = driver.find_elements_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li")
"""Initializing Link Count:"""
link_count = len(list_of_links)
while x <= link_count:
print x
driver.find_element_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li["+str(x)+"]/a").click()
text = driver.find_element_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text
textsave.write(text+"\n\n")
driver.implicitly_wait(5000)
driver.back()
x += 1
textsave.close()
When run, it goes to the initial page, and...goes back to the main page, rather than the subpage that it's supposed to. 运行时,它会进入初始页面,然后...返回主页面,而不是它应该的子页面。 Printing x, I can see it's incrementing three times rather than one.
打印x,我可以看到它增加了三倍而不是一次。 It also crashes after that.
之后它也崩溃了。 I've checked all my xpaths and such, and also confirmed that it's getting the correct count for the number of links in the list.
我已经检查了所有的xpath等等,并且还确认它获得了列表中链接数量的正确计数。
Any input's hugely appreciated--this is really just to flex my python/automation, since I'm just getting into both. 任何输入都非常受欢迎 - 这实际上只是为了展示我的python /自动化,因为我刚刚进入两者。 Thanks in advance!!
提前致谢!!
I'm not sure if this will fix the problem, but in general it is better to use WebDriverWait
rather than implicitly_wait
since WebDriveWait.until will keep calling the supplied function (eg driver.find_element_by_xpath
) until the returned value is not False
-ish or the timeout (eg 5000 seconds) is reached -- at which point it raises a selenium.common.execptions.TimeoutException
. 我不确定这是否能解决问题,但一般情况下最好使用
WebDriverWait
而不是implicitly_wait
因为WebDriveWait.until将继续调用提供的函数(例如driver.find_element_by_xpath
),直到返回的值不是False
-ish或者达到超时(例如5000秒) - 此时它会引发selenium.common.execptions.TimeoutException
。
import selenium.webdriver.support.ui as UI
def test_text_saver(self):
driver = self.driver
wait = UI.WebDriverWait(driver, 5000)
with open("textsave.txt","w") as textsave:
list_of_links = driver.find_elements_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li/a")
for link in list_of_links: # 2
link.click() # 1
text = wait.until(
lambda driver: driver.find_element_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text)
textsave.write(text+"\n\n")
driver.back()
wait.until
is placed directly after link.click()
wait.until
的调用直接放在link.click()
Instead of using 而不是使用
while x <= link_count: ... x += 1
it is better to use 最好使用
for link in list_of_links:
For one think, it improves readability. 有人认为,它提高了可读性。 Moreover, you really don't need to care about the number
x
, all you really care about is looping over the links, which is what the for-loop
does. 而且,你真的不需要关心数字
x
,你真正关心的是循环遍历链接,这就是for-loop
所做的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.