简体   繁体   中英

Click and scrape 'a href' links by class name using Selenium in Python

I have the following a href link with only a class identifier. I'm trying to have Selenium recursively click through each link, but Selenium isn't returning the proper page sources from each 'a href' links.

<div class="row">
 <div class="item">
  ↳<a href /path/to/link/ class="link-box">
 <div class="item">
 <div class="item">
 <div class="item">

What am I doing wrong here:

driver = webdriver.Chrome("/Users/me/Downloads/chromedriver", options=options)
driver.get("https://the_website")
link_box = driver.find_elements_by_class_name('link-box')

for i in range(len(link_box)):
  driver.execute_script("arguments[0].click();", link_box[i])
page_source = driver.page_source
pprint(page_source)

I wrote another code to do it.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
#driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver = webdriver.Firefox(executable_path='geckodriver')
driver.get("url")
l=[]
for a in driver.find_elements_by_class_name('link-box'):
    link = a.get_attribute('href')
    l.append(link)
print(l)

for b in range(len(l)):
    driver.execute_script("window.open('');")
    driver.switch_to.window(driver.window_handles[b+1]) 
    driver.get(l[b])
    print(l[b])

First, it will take all the link which has class link-box. Then it will open all the links in new tabs. Otherwise, there might be an error. I did this with Firefox but if you are doing with Chrome comment line 4 and uncomment line 3 . Then give the right path.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM