简体   繁体   English

如何通过分页导航? (硒)

[英]How to navigate through pagination? (Selenium)

I want to extract all the information by navigating through pagination.我想通过分页导航来提取所有信息。

The source code of pagination part is :分页部分的源代码是:

<tr class="pagination" valign="middle" align="center">
   <td colspan="9">
      <table>
         <tbody>
            <tr>
               <td><span>1</span></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$2')">2</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$3')">3</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$4')">4</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$5')">5</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$6')">6</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$7')">7</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$8')">8</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$9')">9</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$10')">10</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$11')">...</a></td>
               <td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$Last')">Last</a></td>
            </tr>
         </tbody>
      </table>
   </td>
</tr>

And my selenium script is :我的硒脚本是:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from datetime import date,timedelta
import sqlite3
import os

today = date.today()
yesterday = today - timedelta(days=2)
d3 = yesterday.strftime("%m-%d-%Y")

URL = 'https://www.dibbs.bsm.dla.mil//rfq/rfqdates.aspx?category=recent'
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)

driver.get(URL)
driver.find_element_by_id("butAgree").click()
driver.find_element_by_partial_link_text(d3).click()
col1 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[1]')
col2 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[2]')
col3 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[3]')
col4 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[4]')
col5 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[5]')
col6 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[6]')
col7 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[7]')
col8 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[8]')
col9 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[9]')

col1_data = [s.text for s in col1]
col2_data = [s.text for s in col2]
col3_data = [s.text for s in col3]
col4_data = [s.text for s in col4]
col5_data = [s.text for s in col5]
col6_data = [s.text for s in col6]
col7_data = [s.text for s in col7]
col8_data = [s.text for s in col8]
col9_data = [s.text for s in col9]

This code extracts data from one page, I want to extract from all the pages that are listed in the pagination.此代码从一页中提取数据,我想从分页中列出的所有页面中提取数据。

Don't organize your code like that, avoid repetition using a function or list comprehensions:不要这样组织你的代码,避免重复使用函数或列表推导式:

all_cols = [driver.find_elements_by_xpath(f'//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[{i}]') for i in range(1,10)]
all_cols_data = [[s.text for s in col] for col in all_cols]

now you can access your data by indexing现在您可以通过索引访问您的数据

You can wrap the code inside a while loop, with a variable pagination_starting_point and initial value could be set as 2, and for each iteration we will increase the counter.您可以将代码包装在 while 循环中,使用变量pagination_starting_point和初始值可以设置为 2,并且对于每次迭代,我们将增加计数器。

Code :代码 :

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)

today = date.today()
yesterday = today - timedelta(days=2)
d3 = yesterday.strftime("%m-%d-%Y")
URL = 'https://www.dibbs.bsm.dla.mil//rfq/rfqdates.aspx?category=recent'
driver.get(URL)
driver.find_element_by_id("butAgree").click()
driver.find_element_by_partial_link_text(d3).click()
pagination_starting_point = 2
while True:
    col1 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[1]')
    col2 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[2]')
    col3 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[3]')
    col4 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[4]')
    col5 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[5]')
    col6 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[6]')
    col7 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[7]')
    col8 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[8]')
    col9 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[9]')

    col1_data = [s.text for s in col1]
    col2_data = [s.text for s in col2]
    col3_data = [s.text for s in col3]
    col4_data = [s.text for s in col4]
    col5_data = [s.text for s in col5]
    col6_data = [s.text for s in col6]
    col7_data = [s.text for s in col7]
    col8_data = [s.text for s in col8]
    col9_data = [s.text for s in col9]

    wait.until(EC.element_to_be_clickable((By.XPATH, f"//a[contains(@href,'Page${pagination_starting_point}')]"))).click()
    print("Click on page " + pagination_starting_point)
    pagination_starting_point = pagination_starting_point + 1
    if pagination_starting_point == 45:
       break

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM