简体   繁体   中英

Webscraping Click Button Selenium

I am trying to webscrape indeed.com to search for jobs using python, with selenium and beautifulsoup. I want to click next page but cant seem to figure out how to do this. Looked at many threads but it is unclear to me which element I am supposed to perform on. Here is the web page html and the code marked with grey comes up when I inspect the next button.

在此处输入图像描述

Also just to mention I tried first to follow what happens to the url when mousedown is executed. After reading the addppurlparam function and adding the strings in the function and using that url I just get thrown back to page one.

Here is my code for the class with selenium meant to click on the button:

   from selenium import webdriver
from selenium.webdriver import ActionChains

driver = webdriver.Chrome("C:/Users/alleballe/Downloads/chromedriver.exe")
driver.get("https://se.indeed.com/Internship-jobb")
print(driver.title)
#assert "Python" in driver.title
elem = driver.find_element_by_class_name("pagination-list")
elem = elem.find_element_by_xpath("//li/a[@aria-label='Nästa']")
print(elem)
assert "No results found." not in driver.page_source
assert elem

action = ActionChains(driver).click(elem)
action.perform()
print(elem)

driver.close()

The indeed site is formatted so that it shows 10 per page.

Your photo shows the wrong section of HTML instead you can see the links contain start=0 for the first page, start=10 for the second, start=20 for the third,...

You could use this knowledge to do a code like this:

while True:
    i = 0
    driver.get(f'https://se.indeed.com/jobs?q=Internship&start={i}')
    # code here
    i = i + 10

But, to directly answer to your question you should do:

next_page_link = driver.find_element_by_xpath('/html/head/link[6]')
driver.get(next_page_link)

This will find the link and then get it.

its work. paginated to next page.

driver.find_element_by_class_name("pagination-list").find_element_by_tag_name('a').click()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM