繁体   English   中英

Selenium 循环遍历表页

[英]Selenium Loop through table pages

我一直在尝试使用 Selenium 和 Beautiful Soup 来抓取目录,但是鉴于 HTML 是如何编写的,因为没有下一个按钮并且当前选择了当前选择,我似乎找不到循环浏览表格页面的好方法页面按钮具有活动的 class。 这是我到目前为止的代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as bs
import pandas as pd

path_driver = "C:/Users/CS330584/Documents/Documentos de Defesa da Concorrência/Automatização de Processos/chromedriver.exe"
website = "https://sat.sef.sc.gov.br/tax.NET/Sat.Dva.Web/ConsultaPublicaDevedores.aspx"
value_search = "300"

driver = webdriver.Chrome(path_driver)
driver.get(website)

search_max = driver.find_element_by_id("Body_Main_Main_ctl00_txtTotalDevedores")
search_max.send_keys(value_search)

btn_consult = driver.find_element_by_id("Body_Main_Main_ctl00_btnBuscar")
btn_consult.click()

driver.implicitly_wait(10)

i = 1
while True:
    try:
   #some wait
        driver.find_element_by_xpath("//*[@id='Body_Main_Main_grpDevedores_gridView']/tbody/tr[51]/td/ul/li' and .='[]']".format(str(i + 1))).click()
    
    except:
        break 

我怎样才能有效地(甚至不那么有效地)遍历这些表格页面以抓取数据?

下一页的按钮运行JavaScript代码

javascript:GridView_ScrollToTop('Body_Main_Main_grpDevedores_gridView');__doPostBack('ctl00$ctl00$ctl00$Body$Main$Main$grpDevedores$gridView','Page$1')

您还可以使用它来更改页面。

您只需更新Page$1中的数字 - 即。 使用f-string

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import math

path_driver = "C:/Users/CS330584/Documents/Documentos de Defesa da Concorrência/Automatização de Processos/chromedriver.exe"
website = "https://sat.sef.sc.gov.br/tax.NET/Sat.Dva.Web/ConsultaPublicaDevedores.aspx"
value_search = 300

#driver = webdriver.Chrome(path_driver)
driver = webdriver.Firefox()
driver.get(website)

search_max = driver.find_element_by_id("Body_Main_Main_ctl00_txtTotalDevedores")
search_max.send_keys(value_search)

btn_consult = driver.find_element_by_id("Body_Main_Main_ctl00_btnBuscar")
btn_consult.click()

driver.implicitly_wait(10)


pages = math.ceil(value_search/50)
print('pages:', pages)

for i in range(2, pages+1):
    try:
        time.sleep(2)
        driver.execute_script(f"javascript:GridView_ScrollToTop('Body_Main_Main_grpDevedores_gridView');__doPostBack('ctl00$ctl00$ctl00$Body$Main$Main$grpDevedores$gridView','Page${i}')")
    except Exception as ex:
        print(ex)
        break 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM