在python中使用Selenium進行抓取時，只能獲得表中的第一行

Question

我正試圖從BGG中獲取排名數據。

HTML的基本結構如下：

 <table class = "collection_table"> <tbody> <tr></tr> <tr id="row_"></tr> <tr id="row_"></tr> <tr id="row_"></tr> <tr id="row_"></tr> <!--snip--> <tr id="row_"></tr> <tr id="row_"></tr> <tr id="row_"></tr> </tbody> </table>

請注意，除第一行（標題）之外的每一行都具有相同的ID，並且沒有額外的數據將其標記為唯一行。

我的（當前）代碼如下：

def bgg_scrape_rank_page(browser, bgg_data):
    time.sleep(1)
    table = browser.find_element_by_xpath("//table[@class='collection_table']/tbody")
    row = table.find_element_by_xpath("//tr[@id='row_']")
    while row:
        rank = row.find_element_by_xpath("//td[1]").text
        game_name = row.find_element_by_xpath("//td[3]/div[2]/a").text
        game_page = row.find_element_by_xpath("//td[3]/div[2]/a").get_attribute("href")
        print rank, game_name, game_page
        row = row.find_element_by_xpath("//following-sibling::tr")

我也試過迭代使用

rows = browser.find_elements_by_xpath("/tr[@id='row_']")
for row in rows:
    rank = row.find_element_by_xpath("//td[1]").text
    game_name = row.find_element_by_xpath("//td[3]/div[2]/a").text
    game_page = row.find_element_by_xpath("//td[3]/div[2]/a").get_attribute("href")
    print rank, game_name, game_page

問題是，無論我怎樣嘗試，我總是只打印出第一行。 只是一排一排

1 "Pandemic Legacy: Season 1 https://boardgamegeek.com/boardgame/161936/pandemic-legacy-season-1".

Answer 1

問題在於你的XPath ：你需要添加點作為.//指向你想要應用XPath確切上下文而不是只是//總是指向<html> 。 所以試試吧

def bgg_scrape_rank_page(browser, bgg_data):
time.sleep(1)
table = browser.find_element_by_xpath("//table[@class='collection_table']/tbody")
row = table.find_element_by_xpath(".//tr[@id='row_']")
while row:
    rank = row.find_element_by_xpath(".//td[1]").text
    game_name = row.find_element_by_xpath(".//td[3]/div[2]/a").text
    game_page = row.find_element_by_xpath(".//td[3]/div[2]/a").get_attribute("href")
    print rank, game_name, game_page
    row = row.find_element_by_xpath(".//following-sibling::tr")

在python中使用Selenium進行抓取時，只能獲得表中的第一行

問題描述

1 個解決方案

解決方案1
0 已采納 2017-01-30 17:46:48

在python中使用Selenium進行抓取時，只能獲得表中的第一行

問題描述

1 個解決方案

解決方案1 0 已采納 2017-01-30 17:46:48

解決方案1
0 已采納 2017-01-30 17:46:48