简体   繁体   English

使用硒使用jscript _doPostBack链接显示“下一个”搜索结果

[英]using selenium to display 'next' search results using jscript _doPostBack links

In search results of jobquest site ( http://jobquest.detma.org/JobQuest/Training.aspx ), I would like to use selenium to click the "next" link so that the next paginated results table of 20 records would load. 在求职网站( http://jobquest.detma.org/JobQuest/Training.aspx )的搜索结果中,我想使用硒单击“下一个”链接,以便加载下一个20条记录的分页结果表。 I can only scrape as far as the first 20 results. 我只能抓到前20个结果。 Here are my steps that got me that far: 这是我达到目标的步骤:

Step1 : I load the opening page. 步骤1 :我加载了开始页面。

import requests, re
from bs4 import BeautifulSoup
from selenium import webdriver

browser = webdriver.Chrome('../chromedriver')
url ='http://jobquest.detma.org/JobQuest/Training.aspx'
browser.get(url)

Step2 : I find the search button and click it to request a search with no search criteria. 第2步 :找到搜索按钮,然后单击它以请求没有搜索条件的搜索。 After this code, the search results page loads with the first 20 records in a table: 使用此代码后,搜索结果页面将在表中加载前20条记录:

submit_button = browser.find_element_by_id('ctl00_ctl00_bodyMainBase_bodyMain_btnSubmit')
submit_button.click()

Step3 : Now on the search results page, I create some soup and "find_all" to get the correct rows 步骤3 :现在,在搜索结果页面上,我创建了一些汤和“ find_all”以获取正确的行

html = browser.page_source
soup = BeautifulSoup(html, "html.parser")

rows = soup.find_all("tr",{"class":"gvRow"})

At this point, I can fetch my data (job ids) from the first page of results using rows object like this: 在这一点上,我可以使用以下行对象从结果的第一页获取数据(作业ID):

id_list=[]

for row in rows:
    temp = str(row.find("a"))[33:40]
    id_list.append(temp)

QUESTION - Step4 Help!! 问题-Step4帮助! To reload the table with the next 20 results, I have to click the "next" link on the results page. 要用下20个结果重新加载表,我必须单击结果页面上的“下一个”链接。 I used Chrome to inspect it and got these details: 我使用Chrome对其进行了检查,并获得了以下详细信息:

<a href="javascript:__doPostBack('ctl00$ctl00$bodyMainBase$bodyMain$egvResults$ctl01$ctl08','')">Next</a>

I need code to programmatically click on Next and remake the soup with the next 20 records. 我需要代码以编程方式单击“下一步”,并用接下来的20条记录重新制作汤。 I expect that if I could figure this out, I can figure out how to loop the code to get all ~1515 IDs in the database. 我希望,如果我能弄清楚这一点,就可以弄清楚如何循环代码以获取数据库中的所有〜1515个ID。

UPDATE The line that worked for me, suggested in the answer is: 更新答案中建议的对我有用的行是:

WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[href*=ctl08]'))).click()

Thank you, this was very useful. 谢谢,这非常有用。

You can use an attribute = value selector to target the href . 您可以使用attribute = value选择器来定位href In this case I use the substring at the end via contains ( * ) operator. 在这种情况下,我将通过contains( * )运算符在最后使用子字符串。

WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[href*=ctl08]'))).click()

I add in a wait for clickable condition as a precautionary measure. 我添加了等待可点击条件作为预防措施。 You could probably remove that. 您可以删除它。

Additional imports 额外进口

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Without wait condition: 没有等待条件:

browser.find_element_by_css_selector('[href*=ctl08]'),click()

Another way: 其他方式:

Now, instead, you could initially set the page results count to 100 (the max) and then loop through the dropdown for the pages of results to load each new page (then you don't need to work about how many pages) 现在,相反,您可以最初将页面结果数设置为100(最大值),然后遍历结果页面的下拉列表以加载每个新页面(然后,您无需处理多少页面)

import requests, re
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

browser = webdriver.Chrome()
url ='http://jobquest.detma.org/JobQuest/Training.aspx'
browser.get(url)
submit_button = browser.find_element_by_id('ctl00_ctl00_bodyMainBase_bodyMain_btnSubmit')
submit_button.click()
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[value="100"]'))).click()
html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
rows = soup.find_all("tr",{"class":"gvRow"})
id_list=[]

for row in rows:
    temp = str(row.find("a"))[33:40]
    id_list.append(temp)

elems = browser.find_elements_by_css_selector('#ctl00_ctl00_bodyMainBase_bodyMain_egvResults select option')
i = 1
while i < len(elems) / 2:
    browser.find_element_by_css_selector('#ctl00_ctl00_bodyMainBase_bodyMain_egvResults select option[value="' + str(i) + '"]').click()
    #do stuff with new page
    i+=1

You decide what to do with the extracting rows info from each page. 您决定如何处理从每个页面提取行信息。 This was to give you an easy framework for looping all the pages. 这是为了给您提供一个循环所有页面的简单框架。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 selenium 和 python 获取谷歌搜索结果链接 - How to fetch google search results links using selenium and python 使用 Selenium 收集链接会导致 StaleElementReferenceException - Collecting links using Selenium results in StaleElementReferenceException 使用 python selenium 迭代谷歌搜索结果 - Iterating google search results using python selenium 在Python中使用硒打印Google搜索结果 - Print google search results using selenium in Python 使用 Selenium-PYTHON 打开搜索结果 - Open Search Results using Selenium- PYTHON 如何在网页中执行“javascript:__doPostBack”以使用 selenium 下载 pdf 文件? - How to execute “javascript:__doPostBack” in a webpage to download pdf files using selenium? 如何使用Selenium,Python从Google搜索中提取链接 - How to Pull Links from Google Search using Selenium, Python 无法使用BeautifulSoup在Google搜索结果页面上检索链接 - Unable to retrieve links off google search results page using BeautifulSoup 无法使用机械化和Beautifulsoup从Google搜索结果中获取正确的链接 - Not Getting proper links from google search results using mechanize and Beautifulsoup 使用正则表达式排除搜索结果中的href链接 - Exclude href links in search results using regular expressions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM