简体   繁体   English

从 HTML 表的每一行中抓取每个元素

[英]Scraping each element from each row from an HTML table

Link to website: http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal网站链接: http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal

I am trying to write code which goes through each row in a table and extracts each element from that row.我正在尝试编写遍历表中每一行并从该行中提取每个元素的代码。 I am aiming for an ouput in the following layout我的目标是在以下布局中输出

Row1Element1, Row1Element2, Row1Element3 
Row2Element1, Row2Element2, Row2Element3
Row3Element1, Row3Element2, Row3Element3

I have had two major attempts at coding this.我在编写这个代码时进行了两次主要尝试。

Attempt 1:尝试1:

rows = driver.find_elements_by_xpath('//table//body//tr')
elements = rows.find_elements_by_xpath('//td')
#this gets all rows in the table, but then gets all elements on the page, 
not just the table

Attempt 2:尝试2:

driver.find_elements_by_xpath('//table//body//tr//td')
#this gets all the elements that I want, but makes no distinction to which 
 row each element belongs to

Any help is appreciated任何帮助表示赞赏

You can get table headers and use indexes to get right sequence in the row data.您可以获取表标题并使用索引来获取行数据中的正确顺序。

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal")

table_headers = [th.text.strip() for th in driver.find_elements_by_css_selector("#matchheader th")]
rows = driver.find_elements_by_css_selector("#matches tbody > tr")

date_index = table_headers.index("Date")
tournament_index = table_headers.index("Tournament")
score_index = table_headers.index("Score")

for row in rows:
    table_data = row.find_elements_by_tag_name("td")
    print(table_data[date_index].text, table_data[tournament_index].text, table_data[score_index].text)

This is the locator each rows the table you mean XPATH: //table[@id="matches"]//tbody//tr这是每行的定位器,你的意思是XPATH: //table[@id="matches"]//tbody//tr

First following import:首先导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Each rows:每行:

driver.get('http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal')

rows = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="matches"]//tbody//tr')))

for row in rows:
    print(row.text)

Or each cells:或每个单元格:

for row in rows:
    cols = row.find_elements_by_tag_name('td')
    for col in cols:
        print(col.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM