[英]Can only get first row in table when scraping with Selenium in python
我正試圖從BGG中獲取排名數據。
HTML的基本結構如下:
<table class = "collection_table"> <tbody> <tr></tr> <tr id="row_"></tr> <tr id="row_"></tr> <tr id="row_"></tr> <tr id="row_"></tr> <!--snip--> <tr id="row_"></tr> <tr id="row_"></tr> <tr id="row_"></tr> </tbody> </table>
請注意,除第一行(標題)之外的每一行都具有相同的ID,並且沒有額外的數據將其標記為唯一行。
我的(當前)代碼如下:
def bgg_scrape_rank_page(browser, bgg_data):
time.sleep(1)
table = browser.find_element_by_xpath("//table[@class='collection_table']/tbody")
row = table.find_element_by_xpath("//tr[@id='row_']")
while row:
rank = row.find_element_by_xpath("//td[1]").text
game_name = row.find_element_by_xpath("//td[3]/div[2]/a").text
game_page = row.find_element_by_xpath("//td[3]/div[2]/a").get_attribute("href")
print rank, game_name, game_page
row = row.find_element_by_xpath("//following-sibling::tr")
我也試過迭代使用
rows = browser.find_elements_by_xpath("/tr[@id='row_']")
for row in rows:
rank = row.find_element_by_xpath("//td[1]").text
game_name = row.find_element_by_xpath("//td[3]/div[2]/a").text
game_page = row.find_element_by_xpath("//td[3]/div[2]/a").get_attribute("href")
print rank, game_name, game_page
問題是,無論我怎樣嘗試,我總是只打印出第一行。 只是一排一排
1 "Pandemic Legacy: Season 1 https://boardgamegeek.com/boardgame/161936/pandemic-legacy-season-1".
問題在於你的XPath
:你需要添加點作為.//
指向你想要應用XPath
確切上下文而不是只是//
總是指向<html>
。 所以試試吧
def bgg_scrape_rank_page(browser, bgg_data):
time.sleep(1)
table = browser.find_element_by_xpath("//table[@class='collection_table']/tbody")
row = table.find_element_by_xpath(".//tr[@id='row_']")
while row:
rank = row.find_element_by_xpath(".//td[1]").text
game_name = row.find_element_by_xpath(".//td[3]/div[2]/a").text
game_page = row.find_element_by_xpath(".//td[3]/div[2]/a").get_attribute("href")
print rank, game_name, game_page
row = row.find_element_by_xpath(".//following-sibling::tr")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.