使用Selenium和python抓取數據時迭代單擊

Question

我正在嘗試從此網頁上抓取數據

http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting

我需要從表中復制內容並將其放入一個csv文件中，然后轉到下一頁並將這些頁面的內容附加到同一文件中。 我可以抓取表格，但是當我嘗試使用硒webdriver的單擊來循環單擊“下一步”按鈕時，它將轉到下一頁並停止。 這是我的代碼。

    driver = webdriver.Chrome(executable_path = 'path')
    url = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'
def data_from_cricinfo(url):
    driver.get(url)
    pgsource = str(driver.page_source)
    soup = BeautifulSoup(pgsource, 'html5lib')
    data = soup.find_all('div', class_ = 'engineTable')
    for tr in data:
        info = tr.find_all('tr')
             # grab data

    next_link = driver.find_element_by_class_name('PaginationLink')
    next_link.click()
data_from_cricinfo(url)

無論如何，是否可以使用循環單擊所有頁面的下一個並將所有頁面的內容復制到同一文件中？ 提前致謝。

Answer 1

您可以執行以下操作來遍歷所有頁面（通過“ Next按鈕）並解析表中的數據：

from selenium import webdriver
from bs4 import BeautifulSoup

URL = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'

driver = webdriver.Chrome()
driver.get(URL)

while True:
    soup = BeautifulSoup(driver.page_source, 'html5lib')
    table = soup.find_all(class_='engineTable')[2]
    for info in table.find_all('tr'):
        data = [item.text for item in info.find_all("td")]
        print(data)

    try:
        driver.find_element_by_partial_link_text('Next').click()
    except:
        break

driver.quit()

使用Selenium和python抓取數據時迭代單擊

問題描述

1 個解決方案

解決方案1
1 已采納 2018-02-14 18:12:53

使用Selenium和python抓取數據時迭代單擊

問題描述

1 個解決方案

解決方案1 1 已采納 2018-02-14 18:12:53

解決方案1
1 已采納 2018-02-14 18:12:53