请求和美丽的汤在网站上找不到属性

Question

我目前正在使用 requests 和 Beautiful Soup 来抓取 profootballreference.com。 我遇到了我的代码无法识别的领域。 确切的网址是https://www.pro-football-reference.com/boxscores/201809060phi.htm ，代码如下：

game_page = requests.get('https://www.pro-football-reference.com/boxscores/201809060phi.htm')
game_page_soup = BeautifulSoup(game_page.content, 'html.parser')
game_info = game_page_soup.find(id='game_info')
print(game_info)

输出是无。 但是，应返回此字段

<table class="suppress_all sortable stats_table now_sortable" id="game_info" data-cols-to-freeze="0"><thead><tr class="thead onecell"><td class="right center" data-stat="onecell" colspan="2">Game Info</td></tr></thead>
    <caption>Game Info Table</caption>
    <tbody>
<tr data-row="0"><th scope="row" class="center " data-stat="info">Won Toss</th><td class="center " data-stat="stat">Eagles (deferred)</td></tr>
<tr data-row="1"><th scope="row" class="center " data-stat="info">Roof</th><td class="center " data-stat="stat">outdoors</td></tr>
<tr data-row="2"><th scope="row" class="center " data-stat="info">Surface</th><td class="center " data-stat="stat">grass </td></tr>
<tr data-row="3"><th scope="row" class="center " data-stat="info">Duration</th><td class="center " data-stat="stat">3:19</td></tr>
<tr data-row="4"><th scope="row" class="center " data-stat="info">Attendance</th><td class="center " data-stat="stat"><a href="/years/2018/attendance.htm">69,696</a></td></tr>
<tr data-row="5"><th scope="row" class="center " data-stat="info">Weather</th><td class="center " data-stat="stat">81 degrees, wind 8 mph</td></tr>
<tr data-row="6"><th scope="row" class="center " data-stat="info">Vegas Line</th><td class="center " data-stat="stat">Philadelphia Eagles -1.0</td></tr>
<tr data-row="7"><th scope="row" class="center " data-stat="info">Over/Under</th><td class="center " data-stat="stat">44.5 <b>(under)</b></td></tr>

</tbody></table>

为什么这不会回来？

Answer 1

该表位于 HTML 注释 。 要加载它，您可以使用下一个示例：

import requests
from bs4 import BeautifulSoup, Comment

url = "https://www.pro-football-reference.com/boxscores/201809060phi.htm"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

# find the table inside HTML comment <!-- -->
table = soup.find("h2", text="Game Info").find_next(
    text=lambda t: isinstance(t, Comment)
)
table = BeautifulSoup(table, "html.parser").table

# print some data from table:
for tr in table.select("tr"):
    print(tr.get_text(strip=True, separator=" "))

印刷：

Game Info
Won Toss Eagles (deferred)
Roof outdoors
Surface grass
Duration 3:19
Attendance 69,696
Weather 81 degrees, wind 8 mph
Vegas Line Philadelphia Eagles -1.0
Over/Under 44.5 (under)

Answer 2

你可以试试这个：

import requests
from bs4 import BeautifulSoup, Comment

url = "https://www.pro-football-reference.com/boxscores/201809060phi.htm"
soup = BeautifulSoup(requests.get(url).content, "lxml")

all=soup.find("div",attrs={"id":"all_game_info"})

#Approach 1
table = all.find(string=lambda text:isinstance(text,Comment))
#Selecting `commented HTML` inside `div` with Id `all_game_info` (<div id="all_game_info">) using `bs4.Comment`

#Approach 2
table=str(all).rsplit("--",2)[1]
#Extrcting `comment` from div by splitting `--` text from right, only 3. And selecting `second item` from it.

table = BeautifulSoup(table, "lxml")

for th,td in zip(table.find_all("th"),table.find_all("td")):
    print(th.text," : ",td.text)

两种方法的输出相同：

Won Toss  :  Game Info
Roof  :  Eagles (deferred)
Surface  :  outdoors
Duration  :  grass
Attendance  :  3:19
Weather  :  69,696
Vegas Line  :  81 degrees, wind 8 mph
Over/Under  :  Philadelphia Eagles -1.0

请求和美丽的汤在网站上找不到属性

问题描述

2 个解决方案

解决方案1
1 2021-07-25 17:12:34

解决方案2
0 2021-07-25 18:12:51

请求和美丽的汤在网站上找不到属性

问题描述

2 个解决方案

解决方案1 1 2021-07-25 17:12:34

解决方案2 0 2021-07-25 18:12:51

解决方案1
1 2021-07-25 17:12:34

解决方案2
0 2021-07-25 18:12:51