如何使用美丽的汤查找功能提取html元素

Question

我正在尝试使用美丽的汤拉出与下面的HTML代码相对应的表格

<table class="sortable stats_table now_sortable" id="team_pitching" data-cols-to-freeze=",2"> <caption>Team Pitching</caption>

来自https://www.baseball-reference.com/register/team.cgi?id=17cdc2d2 。 这是我试图从中提取的站点布局和 HTML 代码的屏幕截图。

我正在使用代码

url = 'https://www.baseball-reference.com/register/team.cgi?id=17cdc2d2'
res = requests.get(url)
soup1 = BS(res.content, 'html.parser')
table1  = soup1.find('table',{'id':'team_pitching'})
table1

我似乎无法弄清楚如何让这个工作。 上表可以用以下行提取

table1  = soup1.find('table',{'id':'team_batting'})

我认为类似的代码应该适用于下面的代码。 此外，有没有办法使用表类“sortable stats_table now_sortable”而不是 id 来提取它？

Answer 1

问题是，如果您正常打开页面，它会显示所有表格，但是如果您使用开发人员工具加载页面，则只会显示第一个表格。 因此，当您执行请求时，左表不会包含在您获得的 HTML 中。 在按下“显示团队投球”按钮之前，您要查找的表格不会显示，为此您可以使用 Selenium 并获得完整的 HTML 响应。

Answer 2

那是因为您要查找的表 -即带有id="team_pitching" <table>作为注释存在于汤中。 你可以自己检查一下。

你需要

从汤中提取该评论
将其转换为汤对象
从汤对象中提取表数据。

这是执行上述步骤的完整代码。

from bs4 import BeautifulSoup, Comment
import requests

url = 'https://www.baseball-reference.com/register/team.cgi?id=17cdc2d2'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

main_div = soup.find('div', {'id': 'all_team_pitching'})

# Extracting the comment from the above selected <div>
for comments in main_div.find_all(text=lambda x: isinstance(x, Comment)):
    temp = comments.extract()

# Converting the above extracted comment to a soup object
s = BeautifulSoup(temp, 'lxml')
trs = s.find('table', {'id': 'team_pitching'}).find_all('tr')

# Printing the first five entries of the table
for tr in trs[1:5]:
    print(list(tr.stripped_strings))

表中的前 5 个条目

['1', 'Tyler Ahearn', '21', '1', '0', '1.000', '1.93', '6', '0', '0', '1', '9.1', '8', '5', '2', '0', '4', '14', '0', '0', '0', '42', '1.286', '7.7', '0.0', '3.9', '13.5', '3.50']
['2', 'Jack Anderson', '20', '2', '0', '1.000', '0.79', '4', '1', '0', '0', '11.1', '6', '4', '1', '0', '3', '11', '1', '0', '0', '45', '0.794', '4.8', '0.0', '2.4', '8.7', '3.67']
['3', 'Shane Drohan', '*', '21', '0', '1', '.000', '4.08', '4', '4', '0', '0', '17.2', '15', '12', '8', '0', '11', '27', '1', '0', '2', '82', '1.472', '7.6', '0.0', '5.6', '13.8', '2.45']
['4', 'Conor Grady', '21', '2', '0', '1.000', '3.00', '4', '4', '0', '0', '15.0', '10', '5', '5', '3', '8', '15', '1', '0', '2', '68', '1.200', '6.0', '1.8', '4.8', '9.0', '1.88']

如何使用美丽的汤查找功能提取html元素

问题描述

2 个解决方案

解决方案1
0 2021-10-29 05:33:36

解决方案2
0 2021-10-29 07:29:00

如何使用美丽的汤查找功能提取html元素

问题描述

2 个解决方案

解决方案1 0 2021-10-29 05:33:36

解决方案2 0 2021-10-29 07:29:00

解决方案1
0 2021-10-29 05:33:36

解决方案2
0 2021-10-29 07:29:00