[英]Can't seem to scrape tbody from this website
我正在嘗試從該網站上抓取數據: https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/
page = requests.get('https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/')
soup = BeautifulSoup(page.text, 'html.parser')
soup.find_all('tbody')
soup.find_all('tbody') 返回 []。 我不完全確定為什么。
這是我試圖刮掉的 tbody 部分:
<tbody><tr class="page"><td>
7/23/2013
</td><td>
Anthony Spencer
</td><td>
Cowboys
</td><td>
DE
</td><td>
Knee
</td><td>
Knee
</td><td>
Out
</td><td>
Is questionable for 9/8 against the NY Giants
</td></tr><tr class="page"><td>
7/22/2013
</td><td>
Tyrone Crawford
</td><td>
Cowboys
</td><td>
DE
</td><td>
Achilles-tendon
</td><td>
Achilles
</td><td>
Out
</td><td>
Is expected to be placed on injured reserve
</td></tr><tr class="page"><td>
7/16/2013
</td><td>
Ryan Broyles
</td><td>
Lions
</td><td>
WR
</td><td>
Knee
</td><td>
Knee
</td><td>
Questionable
</td><td>
Is questionable for 9/8 against Minnesota
</td></tr><tr class="page"><td>
7/2/2013
</td><td>
Jahvid Best
</td><td>
Lions
</td><td>
RB
</td><td>
Concussion
</td><td>
Concussion
</td><td>
Out
</td><td>
Is out indefinitely
</td></tr><tr class="page"><td>
7/2/2013
</td><td>
Jerel Worthy
</td><td>
Packers
</td><td>
DE
</td><td>
Knee
</td><td>
Knee
</td><td>
Out
</td><td>
Is out indefinitely
</td></tr><tr class="page"><td>
7/2/2013
</td><td>
JC Tretter
</td><td>
Packers
</td><td>
TO
</td><td>
Ankle
</td><td>
Ankle
</td><td>
Out
</td><td>
Is out indefinitely
</td></tr><tr class="page"><td>
</td></tr></tbody>
有人可以幫助我,讓我知道為什么 tbody 上的 find_all 返回一個空列表嗎? 即使我嘗試使用 class 頁面進行 tr,它也會返回一個空列表。
似乎是 html 的問題。 切換到使用“lxml”作為解析器而不是“html.parser”。 老實說,我也只會使用 pandas 。
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/')
soup = bs(r.content, 'lxml')
print(len(soup.find_all('tbody')))
或者,更簡單的表:
import pandas as pd
df = pd.read_html('https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/')[0]
print(df)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.