[英]Can't seem to scrape tbody from this website
我正在尝试从该网站上抓取数据: https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/
page = requests.get('https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/')
soup = BeautifulSoup(page.text, 'html.parser')
soup.find_all('tbody')
soup.find_all('tbody') 返回 []。 我不完全确定为什么。
这是我试图刮掉的 tbody 部分:
<tbody><tr class="page"><td>
7/23/2013
</td><td>
Anthony Spencer
</td><td>
Cowboys
</td><td>
DE
</td><td>
Knee
</td><td>
Knee
</td><td>
Out
</td><td>
Is questionable for 9/8 against the NY Giants
</td></tr><tr class="page"><td>
7/22/2013
</td><td>
Tyrone Crawford
</td><td>
Cowboys
</td><td>
DE
</td><td>
Achilles-tendon
</td><td>
Achilles
</td><td>
Out
</td><td>
Is expected to be placed on injured reserve
</td></tr><tr class="page"><td>
7/16/2013
</td><td>
Ryan Broyles
</td><td>
Lions
</td><td>
WR
</td><td>
Knee
</td><td>
Knee
</td><td>
Questionable
</td><td>
Is questionable for 9/8 against Minnesota
</td></tr><tr class="page"><td>
7/2/2013
</td><td>
Jahvid Best
</td><td>
Lions
</td><td>
RB
</td><td>
Concussion
</td><td>
Concussion
</td><td>
Out
</td><td>
Is out indefinitely
</td></tr><tr class="page"><td>
7/2/2013
</td><td>
Jerel Worthy
</td><td>
Packers
</td><td>
DE
</td><td>
Knee
</td><td>
Knee
</td><td>
Out
</td><td>
Is out indefinitely
</td></tr><tr class="page"><td>
7/2/2013
</td><td>
JC Tretter
</td><td>
Packers
</td><td>
TO
</td><td>
Ankle
</td><td>
Ankle
</td><td>
Out
</td><td>
Is out indefinitely
</td></tr><tr class="page"><td>
</td></tr></tbody>
有人可以帮助我,让我知道为什么 tbody 上的 find_all 返回一个空列表吗? 即使我尝试使用 class 页面进行 tr,它也会返回一个空列表。
似乎是 html 的问题。 切换到使用“lxml”作为解析器而不是“html.parser”。 老实说,我也只会使用 pandas 。
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/')
soup = bs(r.content, 'lxml')
print(len(soup.find_all('tbody')))
或者,更简单的表:
import pandas as pd
df = pd.read_html('https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/')[0]
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.