[英]Web Scraping NBA Reference for Misc Stats
I'm new to web scraping and trying to retrieve the miscellaneous table from https://www.basketball-reference.com/leagues/NBA_2021.html using Beautifulsoup. I'm new to web scraping and trying to retrieve the miscellaneous table from https://www.basketball-reference.com/leagues/NBA_2021.html using Beautifulsoup. I have some code written but I'm unable to print the required table and just returns none.
我写了一些代码,但我无法打印所需的表,只返回一个。
from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd
url = "http://www.basketball-reference.com/leagues/NBA_2021.html"
data = urlopen(url)
soup = BeautifulSoup(data)
table = soup.find('table', id='misc_stats')
print(table)
Any help would be appreciated.任何帮助,将不胜感激。 Thank you
谢谢
The sports-reference.com sites have some of those tables within the comments of the source html. sports-reference.com 站点在源 html 的注释中包含其中一些表。 So you need to pull out the comments, then parse the tables in there:
因此,您需要提取注释,然后解析其中的表格:
import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'}
url = "http://www.basketball-reference.com/leagues/NBA_2021.html"
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
tables = []
for each in comments:
if 'table' in str(each):
try:
tables.append(pd.read_html(each, attrs = {'id': 'misc_stats'}, header=1)[0])
except:
continue
df = tables[0]
Output: Output:
print(df.to_string())
Rk Team Age W L PW PL MOV SOS SRS ORtg DRtg NRtg Pace FTr 3PAr TS% eFG% TOV% ORB% FT/FGA eFG%.1 TOV%.1 DRB% FT/FGA.1 Arena Attend. Attend./G
0 1.0 Los Angeles Lakers 29.0 16.0 6.0 16 6 7.73 -0.08 7.65 112.6 104.8 7.8 99.3 0.257 0.352 0.578 0.547 13.1 23.3 0.193 0.513 12.9 80.5 0.152 STAPLES Center 0 0
1 2.0 Utah Jazz 28.5 16.0 5.0 15 6 7.57 -0.20 7.38 115.9 108.2 7.7 98.2 0.245 0.485 0.587 0.557 13.1 26.2 0.185 0.507 10.4 79.7 0.152 Vivint Smart Home Arena 21290 1935
2 3.0 Milwaukee Bucks 28.0 12.0 8.0 15 5 8.15 -0.79 7.36 118.4 110.4 8.0 101.8 0.225 0.425 0.598 0.576 12.5 24.7 0.164 0.532 11.7 79.7 0.161 Fiserv Forum 0 0
3 4.0 Los Angeles Clippers 29.1 16.0 6.0 16 6 7.23 -1.13 6.10 117.8 110.4 7.4 97.4 0.235 0.413 0.603 0.565 12.1 22.2 0.200 0.537 13.3 80.0 0.189 STAPLES Center 0 0
4 5.0 Denver Nuggets 26.5 12.0 8.0 13 7 4.95 -0.19 4.76 117.0 112.0 5.0 97.1 0.244 0.383 0.587 0.556 12.5 26.1 0.187 0.551 13.8 78.7 0.204 Ball Arena 0 0
5 6.0 Brooklyn Nets 28.1 14.0 9.0 14 9 4.48 -0.56 3.92 117.9 113.5 4.4 101.9 0.264 0.415 0.620 0.584 13.4 20.5 0.217 0.524 10.6 77.6 0.192 Barclays Center 0 0
6 7.0 Phoenix Suns 26.6 11.0 8.0 11 8 2.84 0.24 3.08 110.8 108.0 2.8 97.5 0.214 0.428 0.572 0.537 12.3 18.9 0.179 0.521 12.4 80.0 0.193 Phoenix Suns Arena 0 0
7 8.0 Philadelphia 76ers 26.7 15.0 6.0 13 8 4.19 -1.13 3.06 111.5 107.4 4.1 101.6 0.299 0.351 0.576 0.538 13.8 23.7 0.228 0.515 13.4 77.9 0.199 Wells Fargo Center 0 0
8 9.0 Atlanta Hawks 24.3 10.0 10.0 12 8 2.50 0.26 2.76 112.2 109.7 2.5 99.2 0.298 0.396 0.564 0.517 12.9 25.1 0.243 0.506 11.5 77.1 0.203 State Farm Arena 3529 353
9 10.0 Boston Celtics 25.5 11.0 8.0 11 8 2.53 -0.03 2.50 112.4 109.8 2.6 99.3 0.236 0.359 0.570 0.541 13.4 25.3 0.178 0.536 13.8 77.9 0.209 TD Garden 0 0
10 11.0 Memphis Grizzlies 24.8 9.0 7.0 9 7 1.31 1.15 2.47 108.9 107.6 1.3 100.6 0.192 0.327 0.551 0.523 12.4 23.0 0.149 0.530 14.6 77.5 0.190 FedEx Forum 410 51
11 12.0 Indiana Pacers 26.8 12.0 9.0 12 9 2.71 -0.33 2.38 113.0 110.3 2.7 99.9 0.238 0.381 0.583 0.553 12.6 20.4 0.182 0.533 13.3 76.9 0.194 Bankers Life Fieldhouse 0 0
12 13.0 Houston Rockets 28.4 10.0 9.0 11 8 2.95 -0.97 1.98 109.4 106.5 2.9 102.1 0.255 0.445 0.573 0.541 13.7 19.3 0.193 0.512 13.5 76.8 0.195 Toyota Center 28141 3127
13 14.0 Toronto Raptors 27.2 9.0 12.0 12 9 1.67 -1.33 0.34 111.6 109.9 1.7 100.2 0.238 0.479 0.570 0.532 12.9 22.0 0.195 0.533 14.9 77.4 0.234 Amalie Arena 10989 999
14 15.0 Dallas Mavericks 26.4 8.0 13.0 9 12 -2.00 2.00 0.00 109.6 111.6 -2.0 98.7 0.264 0.411 0.559 0.525 11.3 18.5 0.199 0.530 12.7 76.7 0.216 American Airlines Center 0 0
15 16.0 San Antonio Spurs 26.9 11.0 10.0 10 11 -1.05 0.92 -0.13 110.3 111.3 -1.0 100.3 0.224 0.331 0.550 0.516 10.0 19.9 0.175 0.547 12.5 78.8 0.156 AT&T Center 0 0
16 17.0 Golden State Warriors 26.7 11.0 10.0 10 11 -1.05 0.77 -0.28 108.6 109.6 -1.0 103.2 0.262 0.417 0.563 0.527 12.7 18.4 0.201 0.514 13.5 75.8 0.249 Chase Center 0 0
17 18.0 Charlotte Hornets 24.8 10.0 11.0 10 11 -0.62 -0.41 -1.03 110.2 110.8 -0.6 99.0 0.247 0.414 0.560 0.529 13.1 23.3 0.185 0.544 14.1 75.0 0.166 Spectrum Center 0 0
18 19.0 New York Knicks 24.4 9.0 13.0 10 12 -2.00 0.53 -1.47 107.1 109.2 -2.1 95.4 0.264 0.319 0.538 0.500 12.6 23.8 0.203 0.503 10.7 76.9 0.198 Madison Square Garden (IV) 0 0
19 20.0 Portland Trail Blazers 27.3 11.0 9.0 9 11 -1.65 -0.07 -1.72 115.0 116.6 -1.6 99.8 0.229 0.460 0.567 0.529 10.1 21.5 0.190 0.560 12.2 78.0 0.209 Moda Center 0 0
20 21.0 Chicago Bulls 24.9 8.0 11.0 8 11 -2.26 0.36 -1.90 110.9 113.1 -2.2 103.4 0.246 0.413 0.590 0.556 15.2 20.8 0.196 0.553 12.9 80.0 0.217 United Center 0 0
21 22.0 New Orleans Pelicans 25.1 7.0 12.0 8 11 -2.58 -0.17 -2.75 110.3 112.8 -2.5 99.6 0.284 0.365 0.558 0.526 13.4 25.6 0.203 0.549 12.8 79.9 0.193 Smoothie King Center 8820 980
22 23.0 Detroit Pistons 26.3 5.0 16.0 7 14 -4.67 1.82 -2.85 107.7 112.4 -4.7 98.5 0.273 0.408 0.544 0.501 13.0 22.6 0.215 0.558 14.2 76.6 0.194 Little Caesars Arena 0 0
23 24.0 Cleveland Cavaliers 24.7 10.0 11.0 8 13 -4.19 0.04 -4.15 104.9 109.1 -4.2 97.2 0.254 0.309 0.536 0.505 14.4 25.9 0.181 0.537 14.9 75.3 0.170 Quicken Loans Arena 12564 1142
24 25.0 Miami Heat 26.7 7.0 13.0 7 13 -5.45 0.29 -5.16 106.9 112.3 -5.4 98.9 0.263 0.452 0.581 0.547 15.7 17.0 0.204 0.543 13.3 76.6 0.183 AmericanAirlines Arena 0 0
25 26.0 Sacramento Kings 25.7 9.0 11.0 7 13 -5.80 0.45 -5.35 112.7 118.4 -5.7 100.1 0.283 0.377 0.576 0.546 13.0 23.5 0.203 0.558 11.6 75.8 0.194 Golden 1 Center 0 0
26 27.0 Washington Wizards 26.2 4.0 13.0 6 11 -5.29 -0.85 -6.14 112.1 117.2 -5.1 104.4 0.282 0.374 0.569 0.534 11.6 20.7 0.212 0.565 12.8 78.9 0.251 Capital One Arena 0 0
27 28.0 Oklahoma City Thunder 23.7 8.0 11.0 5 14 -8.26 0.61 -7.66 105.2 113.3 -8.1 101.3 0.243 0.446 0.556 0.527 12.9 15.7 0.176 0.537 10.9 77.7 0.157 Chesapeake Energy Arena 0 0
28 29.0 Orlando Magic 26.2 8.0 14.0 6 16 -6.82 -1.40 -8.22 105.5 112.3 -6.8 98.9 0.220 0.358 0.526 0.490 12.2 24.1 0.174 0.547 12.4 79.7 0.173 Amway Center 35768 3252
29 30.0 Minnesota Timberwolves 23.5 5.0 15.0 5 15 -9.30 0.55 -8.76 104.6 113.7 -9.1 101.1 0.230 0.377 0.530 0.497 12.7 23.3 0.174 0.539 13.3 75.0 0.217 Target Center 0 0
30 NaN League Average 26.3 NaN NaN 10 10 0.00 0.00 0.00 111.1 111.1 NaN 99.8 0.250 0.396 0.568 0.534 12.8 22.2 0.193 0.534 12.8 77.8 0.193 NaN 4050 400
If you look at the source html, you'll see the tables in the comments start with <!--
如果您查看源代码 html,您会看到评论中的表格以
<!--
开头
BeautifulSoup skips over those. BeautifulSoup 跳过这些。 Hense, you need to add the part in the code that specifically finds the comments
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
. Hense,您需要在代码中添加专门找到注释的部分
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
。 Once you have all the comments, then you can iterate through each comment to see if theres a table in there.获得所有评论后,您可以遍历每个评论以查看其中是否有表格。 If there's a table, parse it, like you normally would with the non commented
<table>
tags.如果有一个表,解析它,就像你通常使用未注释的
<table>
标记一样。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.