[英]Scraping each element from each row from an HTML table
Link to website: http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal网站链接: http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal
I am trying to write code which goes through each row in a table and extracts each element from that row.我正在尝试编写遍历表中每一行并从该行中提取每个元素的代码。 I am aiming for an ouput in the following layout
我的目标是在以下布局中输出
Row1Element1, Row1Element2, Row1Element3
Row2Element1, Row2Element2, Row2Element3
Row3Element1, Row3Element2, Row3Element3
I have had two major attempts at coding this.我在编写这个代码时进行了两次主要尝试。
Attempt 1:尝试1:
rows = driver.find_elements_by_xpath('//table//body//tr')
elements = rows.find_elements_by_xpath('//td')
#this gets all rows in the table, but then gets all elements on the page,
not just the table
Attempt 2:尝试2:
driver.find_elements_by_xpath('//table//body//tr//td')
#this gets all the elements that I want, but makes no distinction to which
row each element belongs to
Any help is appreciated任何帮助表示赞赏
You can get table headers and use indexes to get right sequence in the row data.您可以获取表标题并使用索引来获取行数据中的正确顺序。
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal")
table_headers = [th.text.strip() for th in driver.find_elements_by_css_selector("#matchheader th")]
rows = driver.find_elements_by_css_selector("#matches tbody > tr")
date_index = table_headers.index("Date")
tournament_index = table_headers.index("Tournament")
score_index = table_headers.index("Score")
for row in rows:
table_data = row.find_elements_by_tag_name("td")
print(table_data[date_index].text, table_data[tournament_index].text, table_data[score_index].text)
This is the locator each rows the table you mean XPATH: //table[@id="matches"]//tbody//tr
这是每行的定位器,你的意思是
XPATH: //table[@id="matches"]//tbody//tr
First following import:首先导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
Each rows:每行:
driver.get('http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal')
rows = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="matches"]//tbody//tr')))
for row in rows:
print(row.text)
Or each cells:或每个单元格:
for row in rows:
cols = row.find_elements_by_tag_name('td')
for col in cols:
print(col.text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.