简体   繁体   English

Webscraping 点击按钮 Selenium

[英]Webscraping Click Button Selenium

I am trying to webscrape indeed.com to search for jobs using python, with selenium and beautifulsoup. I want to click next page but cant seem to figure out how to do this.我正在尝试使用 python、selenium 和 beautifulsoup 进行 webscrape indeed.com 搜索工作。我想单击下一页,但似乎无法弄清楚如何执行此操作。 Looked at many threads but it is unclear to me which element I am supposed to perform on.查看了许多线程,但我不清楚我应该在哪个元素上执行。 Here is the web page html and the code marked with grey comes up when I inspect the next button.这是 web 页 html,当我检查下一个按钮时,会出现标有灰色的代码。

在此处输入图像描述

Also just to mention I tried first to follow what happens to the url when mousedown is executed.还要提一下,我首先尝试跟踪执行 mousedown 时 url 发生的情况。 After reading the addppurlparam function and adding the strings in the function and using that url I just get thrown back to page one.阅读 addppurlparam function 并在 function 中添加字符串并使用 url 后,我就回到了第一页。

Here is my code for the class with selenium meant to click on the button:这是我的 class 代码,其中 selenium 用于单击按钮:

   from selenium import webdriver
from selenium.webdriver import ActionChains

driver = webdriver.Chrome("C:/Users/alleballe/Downloads/chromedriver.exe")
driver.get("https://se.indeed.com/Internship-jobb")
print(driver.title)
#assert "Python" in driver.title
elem = driver.find_element_by_class_name("pagination-list")
elem = elem.find_element_by_xpath("//li/a[@aria-label='Nästa']")
print(elem)
assert "No results found." not in driver.page_source
assert elem

action = ActionChains(driver).click(elem)
action.perform()
print(elem)

driver.close()

The indeed site is formatted so that it shows 10 per page. Indeed 网站的格式设置为每页显示 10 个。

Your photo shows the wrong section of HTML instead you can see the links contain start=0 for the first page, start=10 for the second, start=20 for the third,...您的照片显示错误的部分 HTML 相反,您可以看到链接包含第一页的start=0 ,第二页的start=10 ,第三页的start=20 ,...

You could use this knowledge to do a code like this:您可以使用这些知识来编写如下代码:

while True:
    i = 0
    driver.get(f'https://se.indeed.com/jobs?q=Internship&start={i}')
    # code here
    i = i + 10

But, to directly answer to your question you should do:但是,要直接回答您的问题,您应该这样做:

next_page_link = driver.find_element_by_xpath('/html/head/link[6]')
driver.get(next_page_link)

This will find the link and then get it.这将找到链接,然后获取它。

its work.是工作。 paginated to next page.分页到下一页。

driver.find_element_by_class_name("pagination-list").find_element_by_tag_name('a').click()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM