简体   繁体   English

python selenium 循环通过一些链接

[英]python selenium loop through some links

I have an array of links that I am trying to access to every link and print something from it, then return to the main page and access the second link, then do the same until I finish all links in the array.我有一个链接数组,我试图访问每个链接并从中打印一些东西,然后返回主页并访问第二个链接,然后做同样的事情,直到我完成数组中的所有链接。

What happens is that the first link is the only one that works, like if all the links in the array are gone.发生的情况是第一个链接是唯一有效的链接,就像数组中的所有链接都消失了一样。 I get the error:我得到错误:

File "e:\work\MY CODE\scraping\learn.py", line 25, in theprint link.click()

    from selenium import webdriver
from selenium.webdriver.common import keys
#it make us able to use keybored keys like enter ,esc , etc....
from selenium.webdriver.common.keys import Keys
import time

#make us can wait for event to happen until run the next line of code
from selenium.webdriver.common.by import By
from selenium.webdriver.remote import command
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#get the google chrome driver path
PATH="E:\work\crom\chromedriver.exe"
#pass the pass to selenium webdriver method
driver=webdriver.Chrome(PATH)
#get the link of the site we want
driver.get("https://app.dealroom.co/companies.startups/f/client_focus/anyof_business/company_status/not_closed/company_type/not_government%20nonprofit/employees/anyof_2-10_11-50_51-200/has_website_url/anyof_yes/slug_locations/anyof_france?sort=-revenue")

#wait for the page to load
time.sleep(5)
#get the links i want to get info from
the_links=driver.find_elements_by_class_name("table-list-item")

#function that go the link and print somethin and return to main page
links=[]
the_links=driver.find_elements_by_class_name("table-list-item")
for link in the_links:
      links.append(link.get_attribute('href'))

for link in links:
      driver.get(link)
      website=driver.find_element_by_class_name("item-details-info__url")
      print(website.text)
      driver.back()
      time.sleep(3)
      

Your code will throw a stale element reference error because when you navigate to the next page, the variable holding any elements of the previous page will become unusable.您的代码将抛出一个过时的元素引用错误,因为当您导航到下一页时,保存前一页任何元素的变量将变得不可用。

So what you need to do is either store all elements in array and then loop through it like this:所以你需要做的是将所有元素存储在数组中,然后像这样循环遍历它:

links=[]
the_links=driver.find_elements_by_class_name("table-list-item")
for link in the_links:
    links.append(link.get_attribute('href'))

for link in links:
    driver.get(link)
    print("do something on this link")

Or you can use a while loop in your current and after driver.back() again populate the the_links variable.或者,您可以在当前使用 while 循环,然后在 driver.back() 再次填充 the_links 变量。

Karim, is the element with class_name "item-details-info__url" present on all pages?卡里姆,所有页面上都存在 class_name 为“item-details-info__url”的元素吗? Also, what error does get() method throwing?另外,get() 方法会抛出什么错误?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM