简体   繁体   English

在python中使用Selenium进行抓取时,只能获得表中的第一行

[英]Can only get first row in table when scraping with Selenium in python

I'm trying to scrape rank data from BGG . 我正试图从BGG中获取排名数据。

The basic structure of the HTML is like: HTML的基本结构如下:

 <table class = "collection_table"> <tbody> <tr></tr> <tr id="row_"></tr> <tr id="row_"></tr> <tr id="row_"></tr> <tr id="row_"></tr> <!--snip--> <tr id="row_"></tr> <tr id="row_"></tr> <tr id="row_"></tr> </tbody> </table> 

Note that every row except the first (a header) has the same id, and no extra data to mark it as a unique row. 请注意,除第一行(标题)之外的每一行都具有相同的ID,并且没有额外的数据将其标记为唯一行。

My (current) code is as follows: 我的(当前)代码如下:

def bgg_scrape_rank_page(browser, bgg_data):
    time.sleep(1)
    table = browser.find_element_by_xpath("//table[@class='collection_table']/tbody")
    row = table.find_element_by_xpath("//tr[@id='row_']")
    while row:
        rank = row.find_element_by_xpath("//td[1]").text
        game_name = row.find_element_by_xpath("//td[3]/div[2]/a").text
        game_page = row.find_element_by_xpath("//td[3]/div[2]/a").get_attribute("href")
        print rank, game_name, game_page
        row = row.find_element_by_xpath("//following-sibling::tr")

I have also tried iterating using 我也试过迭代使用

rows = browser.find_elements_by_xpath("/tr[@id='row_']")
for row in rows:
    rank = row.find_element_by_xpath("//td[1]").text
    game_name = row.find_element_by_xpath("//td[3]/div[2]/a").text
    game_page = row.find_element_by_xpath("//td[3]/div[2]/a").get_attribute("href")
    print rank, game_name, game_page

The problem is, no matter what I seem to try, I always only get the first row printed out. 问题是,无论我怎样尝试,我总是只打印出第一行。 Just row after row of 只是一排一排

1 "Pandemic Legacy: Season 1 https://boardgamegeek.com/boardgame/161936/pandemic-legacy-season-1".

The problem is in your XPath : you need to add dot as .// to point on exact context where you want to apply XPath instead of just // that always points on <html> . 问题在于你的XPath :你需要添加点作为.//指向你想要应用XPath确切上下文而不是只是//总是指向<html> So try 所以试试吧

def bgg_scrape_rank_page(browser, bgg_data):
time.sleep(1)
table = browser.find_element_by_xpath("//table[@class='collection_table']/tbody")
row = table.find_element_by_xpath(".//tr[@id='row_']")
while row:
    rank = row.find_element_by_xpath(".//td[1]").text
    game_name = row.find_element_by_xpath(".//td[3]/div[2]/a").text
    game_page = row.find_element_by_xpath(".//td[3]/div[2]/a").get_attribute("href")
    print rank, game_name, game_page
    row = row.find_element_by_xpath(".//following-sibling::tr")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM