简体   繁体   English

Web 为其他统计数据刮 NBA 参考

[英]Web Scraping NBA Reference for Misc Stats

I'm new to web scraping and trying to retrieve the miscellaneous table from https://www.basketball-reference.com/leagues/NBA_2021.html using Beautifulsoup. I'm new to web scraping and trying to retrieve the miscellaneous table from https://www.basketball-reference.com/leagues/NBA_2021.html using Beautifulsoup. I have some code written but I'm unable to print the required table and just returns none.我写了一些代码,但我无法打印所需的表,只返回一个。

from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd 

url = "http://www.basketball-reference.com/leagues/NBA_2021.html"
data = urlopen(url)
soup = BeautifulSoup(data)

table = soup.find('table', id='misc_stats')
print(table)

Any help would be appreciated.任何帮助,将不胜感激。 Thank you谢谢

The sports-reference.com sites have some of those tables within the comments of the source html. sports-reference.com 站点在源 html 的注释中包含其中一些表。 So you need to pull out the comments, then parse the tables in there:因此,您需要提取注释,然后解析其中的表格:

import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'}

url = "http://www.basketball-reference.com/leagues/NBA_2021.html"
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))

tables = []
for each in comments:
    if 'table' in str(each):
        try:
            tables.append(pd.read_html(each, attrs = {'id': 'misc_stats'}, header=1)[0])
        except:
            continue

df = tables[0]

Output: Output:

print(df.to_string())
      Rk                    Team   Age     W     L  PW  PL   MOV   SOS   SRS   ORtg   DRtg  NRtg   Pace    FTr   3PAr    TS%   eFG%  TOV%  ORB%  FT/FGA  eFG%.1  TOV%.1  DRB%  FT/FGA.1                       Arena  Attend.  Attend./G
0    1.0      Los Angeles Lakers  29.0  16.0   6.0  16   6  7.73 -0.08  7.65  112.6  104.8   7.8   99.3  0.257  0.352  0.578  0.547  13.1  23.3   0.193   0.513    12.9  80.5     0.152              STAPLES Center        0          0
1    2.0               Utah Jazz  28.5  16.0   5.0  15   6  7.57 -0.20  7.38  115.9  108.2   7.7   98.2  0.245  0.485  0.587  0.557  13.1  26.2   0.185   0.507    10.4  79.7     0.152     Vivint Smart Home Arena    21290       1935
2    3.0         Milwaukee Bucks  28.0  12.0   8.0  15   5  8.15 -0.79  7.36  118.4  110.4   8.0  101.8  0.225  0.425  0.598  0.576  12.5  24.7   0.164   0.532    11.7  79.7     0.161                Fiserv Forum        0          0
3    4.0    Los Angeles Clippers  29.1  16.0   6.0  16   6  7.23 -1.13  6.10  117.8  110.4   7.4   97.4  0.235  0.413  0.603  0.565  12.1  22.2   0.200   0.537    13.3  80.0     0.189              STAPLES Center        0          0
4    5.0          Denver Nuggets  26.5  12.0   8.0  13   7  4.95 -0.19  4.76  117.0  112.0   5.0   97.1  0.244  0.383  0.587  0.556  12.5  26.1   0.187   0.551    13.8  78.7     0.204                  Ball Arena        0          0
5    6.0           Brooklyn Nets  28.1  14.0   9.0  14   9  4.48 -0.56  3.92  117.9  113.5   4.4  101.9  0.264  0.415  0.620  0.584  13.4  20.5   0.217   0.524    10.6  77.6     0.192             Barclays Center        0          0
6    7.0            Phoenix Suns  26.6  11.0   8.0  11   8  2.84  0.24  3.08  110.8  108.0   2.8   97.5  0.214  0.428  0.572  0.537  12.3  18.9   0.179   0.521    12.4  80.0     0.193          Phoenix Suns Arena        0          0
7    8.0      Philadelphia 76ers  26.7  15.0   6.0  13   8  4.19 -1.13  3.06  111.5  107.4   4.1  101.6  0.299  0.351  0.576  0.538  13.8  23.7   0.228   0.515    13.4  77.9     0.199          Wells Fargo Center        0          0
8    9.0           Atlanta Hawks  24.3  10.0  10.0  12   8  2.50  0.26  2.76  112.2  109.7   2.5   99.2  0.298  0.396  0.564  0.517  12.9  25.1   0.243   0.506    11.5  77.1     0.203            State Farm Arena     3529        353
9   10.0          Boston Celtics  25.5  11.0   8.0  11   8  2.53 -0.03  2.50  112.4  109.8   2.6   99.3  0.236  0.359  0.570  0.541  13.4  25.3   0.178   0.536    13.8  77.9     0.209                   TD Garden        0          0
10  11.0       Memphis Grizzlies  24.8   9.0   7.0   9   7  1.31  1.15  2.47  108.9  107.6   1.3  100.6  0.192  0.327  0.551  0.523  12.4  23.0   0.149   0.530    14.6  77.5     0.190                 FedEx Forum      410         51
11  12.0          Indiana Pacers  26.8  12.0   9.0  12   9  2.71 -0.33  2.38  113.0  110.3   2.7   99.9  0.238  0.381  0.583  0.553  12.6  20.4   0.182   0.533    13.3  76.9     0.194     Bankers Life Fieldhouse        0          0
12  13.0         Houston Rockets  28.4  10.0   9.0  11   8  2.95 -0.97  1.98  109.4  106.5   2.9  102.1  0.255  0.445  0.573  0.541  13.7  19.3   0.193   0.512    13.5  76.8     0.195               Toyota Center    28141       3127
13  14.0         Toronto Raptors  27.2   9.0  12.0  12   9  1.67 -1.33  0.34  111.6  109.9   1.7  100.2  0.238  0.479  0.570  0.532  12.9  22.0   0.195   0.533    14.9  77.4     0.234                Amalie Arena    10989        999
14  15.0        Dallas Mavericks  26.4   8.0  13.0   9  12 -2.00  2.00  0.00  109.6  111.6  -2.0   98.7  0.264  0.411  0.559  0.525  11.3  18.5   0.199   0.530    12.7  76.7     0.216    American Airlines Center        0          0
15  16.0       San Antonio Spurs  26.9  11.0  10.0  10  11 -1.05  0.92 -0.13  110.3  111.3  -1.0  100.3  0.224  0.331  0.550  0.516  10.0  19.9   0.175   0.547    12.5  78.8     0.156                 AT&T Center        0          0
16  17.0   Golden State Warriors  26.7  11.0  10.0  10  11 -1.05  0.77 -0.28  108.6  109.6  -1.0  103.2  0.262  0.417  0.563  0.527  12.7  18.4   0.201   0.514    13.5  75.8     0.249                Chase Center        0          0
17  18.0       Charlotte Hornets  24.8  10.0  11.0  10  11 -0.62 -0.41 -1.03  110.2  110.8  -0.6   99.0  0.247  0.414  0.560  0.529  13.1  23.3   0.185   0.544    14.1  75.0     0.166             Spectrum Center        0          0
18  19.0         New York Knicks  24.4   9.0  13.0  10  12 -2.00  0.53 -1.47  107.1  109.2  -2.1   95.4  0.264  0.319  0.538  0.500  12.6  23.8   0.203   0.503    10.7  76.9     0.198  Madison Square Garden (IV)        0          0
19  20.0  Portland Trail Blazers  27.3  11.0   9.0   9  11 -1.65 -0.07 -1.72  115.0  116.6  -1.6   99.8  0.229  0.460  0.567  0.529  10.1  21.5   0.190   0.560    12.2  78.0     0.209                 Moda Center        0          0
20  21.0           Chicago Bulls  24.9   8.0  11.0   8  11 -2.26  0.36 -1.90  110.9  113.1  -2.2  103.4  0.246  0.413  0.590  0.556  15.2  20.8   0.196   0.553    12.9  80.0     0.217               United Center        0          0
21  22.0    New Orleans Pelicans  25.1   7.0  12.0   8  11 -2.58 -0.17 -2.75  110.3  112.8  -2.5   99.6  0.284  0.365  0.558  0.526  13.4  25.6   0.203   0.549    12.8  79.9     0.193        Smoothie King Center     8820        980
22  23.0         Detroit Pistons  26.3   5.0  16.0   7  14 -4.67  1.82 -2.85  107.7  112.4  -4.7   98.5  0.273  0.408  0.544  0.501  13.0  22.6   0.215   0.558    14.2  76.6     0.194        Little Caesars Arena        0          0
23  24.0     Cleveland Cavaliers  24.7  10.0  11.0   8  13 -4.19  0.04 -4.15  104.9  109.1  -4.2   97.2  0.254  0.309  0.536  0.505  14.4  25.9   0.181   0.537    14.9  75.3     0.170         Quicken Loans Arena    12564       1142
24  25.0              Miami Heat  26.7   7.0  13.0   7  13 -5.45  0.29 -5.16  106.9  112.3  -5.4   98.9  0.263  0.452  0.581  0.547  15.7  17.0   0.204   0.543    13.3  76.6     0.183      AmericanAirlines Arena        0          0
25  26.0        Sacramento Kings  25.7   9.0  11.0   7  13 -5.80  0.45 -5.35  112.7  118.4  -5.7  100.1  0.283  0.377  0.576  0.546  13.0  23.5   0.203   0.558    11.6  75.8     0.194             Golden 1 Center        0          0
26  27.0      Washington Wizards  26.2   4.0  13.0   6  11 -5.29 -0.85 -6.14  112.1  117.2  -5.1  104.4  0.282  0.374  0.569  0.534  11.6  20.7   0.212   0.565    12.8  78.9     0.251           Capital One Arena        0          0
27  28.0   Oklahoma City Thunder  23.7   8.0  11.0   5  14 -8.26  0.61 -7.66  105.2  113.3  -8.1  101.3  0.243  0.446  0.556  0.527  12.9  15.7   0.176   0.537    10.9  77.7     0.157     Chesapeake Energy Arena        0          0
28  29.0           Orlando Magic  26.2   8.0  14.0   6  16 -6.82 -1.40 -8.22  105.5  112.3  -6.8   98.9  0.220  0.358  0.526  0.490  12.2  24.1   0.174   0.547    12.4  79.7     0.173                Amway Center    35768       3252
29  30.0  Minnesota Timberwolves  23.5   5.0  15.0   5  15 -9.30  0.55 -8.76  104.6  113.7  -9.1  101.1  0.230  0.377  0.530  0.497  12.7  23.3   0.174   0.539    13.3  75.0     0.217               Target Center        0          0
30   NaN          League Average  26.3   NaN   NaN  10  10  0.00  0.00  0.00  111.1  111.1   NaN   99.8  0.250  0.396  0.568  0.534  12.8  22.2   0.193   0.534    12.8  77.8     0.193                         NaN     4050        400

If you look at the source html, you'll see the tables in the comments start with <!--如果您查看源代码 html,您会看到评论中的表格以<!--开头

BeautifulSoup skips over those. BeautifulSoup 跳过这些。 Hense, you need to add the part in the code that specifically finds the comments comments = soup.find_all(string=lambda text: isinstance(text, Comment)) . Hense,您需要在代码中添加专门找到注释的部分comments = soup.find_all(string=lambda text: isinstance(text, Comment)) Once you have all the comments, then you can iterate through each comment to see if theres a table in there.获得所有评论后,您可以遍历每个评论以查看其中是否有表格。 If there's a table, parse it, like you normally would with the non commented <table> tags.如果有一个表,解析它,就像你通常使用未注释的<table>标记一样。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM