![](/img/trans.png)
[英]Iterating over click for the table cells containing the link and finding it by link text while scraping data using selenium and python
[英]Iterating over click while scraping data using selenium and python
我正在嘗試從此網頁上抓取數據
我需要從表中復制內容並將其放入一個csv文件中,然后轉到下一頁並將這些頁面的內容附加到同一文件中。 我可以抓取表格,但是當我嘗試使用硒webdriver的單擊來循環單擊“下一步”按鈕時,它將轉到下一頁並停止。 這是我的代碼。
driver = webdriver.Chrome(executable_path = 'path')
url = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'
def data_from_cricinfo(url):
driver.get(url)
pgsource = str(driver.page_source)
soup = BeautifulSoup(pgsource, 'html5lib')
data = soup.find_all('div', class_ = 'engineTable')
for tr in data:
info = tr.find_all('tr')
# grab data
next_link = driver.find_element_by_class_name('PaginationLink')
next_link.click()
data_from_cricinfo(url)
無論如何,是否可以使用循環單擊所有頁面的下一個並將所有頁面的內容復制到同一文件中? 提前致謝。
您可以執行以下操作來遍歷所有頁面(通過“ Next
按鈕)並解析表中的數據:
from selenium import webdriver
from bs4 import BeautifulSoup
URL = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'
driver = webdriver.Chrome()
driver.get(URL)
while True:
soup = BeautifulSoup(driver.page_source, 'html5lib')
table = soup.find_all(class_='engineTable')[2]
for info in table.find_all('tr'):
data = [item.text for item in info.find_all("td")]
print(data)
try:
driver.find_element_by_partial_link_text('Next').click()
except:
break
driver.quit()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.