[英]Web Scraping a page with multiple tables
I am trying to web scrape the second table from this website: https://fbref.com/en/comps/9/stats/Premier-League-Stats However, I have only ever managed to extract the information from the first table when trying to access the information by finding the table tag.我正在尝试 web 从该网站刮取第二张表: https://fbref.com/en/comps/9/stats/Premier-League-Stats但是,我只设法从第一个表中提取信息时试图通过查找表标签来访问信息。 Would anyone be able to explain to me why I cannot access the second table or show me how to do it.
谁能向我解释为什么我无法访问第二张桌子或告诉我如何去做。
import requests
from bs4 import BeautifulSoup
url = "https://fbref.com/en/comps/9/stats/Premier-League-Stats"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')
pl_table = soup.find_all("table")
player_table = tables[0]
Something along these lines should do it沿着这些路线做的事情应该做
tables = soup.find_all("table") # returns a list of tables
second_table = tables[1]
The table is inside HTML comments <.--... -->
.该表位于 HTML 注释
<.--... -->
内。
To get the table from comments, you can use this example:要从评论中获取表格,您可以使用以下示例:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = BeautifulSoup(soup.select_one('#all_stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')
#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))
Prints:印刷:
Patrick van Aanholt Crystal Palace 1990
Max Aarons Norwich City 2000
Tammy Abraham Chelsea 1997
Che Adams Southampton 1996
Adrián Liverpool 1987
Sergio Agüero Manchester City 1988
Albian Ajeti West Ham 1997
Nathan Aké Bournemouth 1995
Marc Albrighton Leicester City 1989
Toby Alderweireld Tottenham 1989
...and so on.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.