简体   繁体   中英

I can't figure out why my web scraping code isn't working

I am very new to coding and I am trying to build a web scraper for Excel so that I can transfer it to Google Sheets. Unfortunately, the code that I have written is working for other people, but not me.

This is the code I have written:

import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd
URL = 'https://www.hockey-reference.com/leagues/NHL_2021.html'
csv_name = 'nhl_season_stats.csv'
def get_nhl_stats(URL):
    headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
    pageTree = requests.get(URL, headers=headers)
    pageSoup = BeautifulSoup(pageTree.content, 'html.parser')
    comments = pageSoup.find_all(string=lambda text: isinstance(text, Comment))  
    tables = []
    for each in comments:
        if 'table' in each:
            try:
                tables.append(pd.read_html(each, header=1)[0])
            except:
                continue    
    df = tables[0]
    df = df.rename(columns={'Unnamed: 1':'Team'})
    df.to_csv(csv_name, index = False)
    print(df)

get_nhl_stats(URL)

After running it, I receive this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 13, in get_nhl_stats
IndexError: list index out of range

Sorry for my bad jargon, as I am very new and very confused, but any help would be greatly appreciated!

this code working, maybe the problem is in the declaration of the class "Comment" or the server does not give you the requested values:

import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = 'https://www.hockey-reference.com/leagues/NHL_2021.html'
csv_name = 'nhl_season_stats.csv'
def get_nhl_stats(URL):
    headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
    pageTree = requests.get(URL, headers=headers)
    pageSoup = BeautifulSoup(pageTree.content, 'html.parser')
    comments = pageSoup.find_all(string=lambda text: isinstance(text, str))    
    tables = []
    for each in comments:
        if 'table' in each:
            try:
                tables.append(pd.read_html(each, header=1)[0])
            except:
                continue    
    df = tables[0]
    df = df.rename(columns={'Unnamed: 1':'Team'})
    df.to_csv(csv_name, index = False)
    print(df)

get_nhl_stats(URL)

output:

      Rk                   Team  AvAge  GP  W  L  OL  PTS   PTS%  GF  GA  SOW  SOL   SRS   SOS  TG/G  EVGF  EVGA  PP  PPO    PP%  PPA  PPOA     PK%  SH  SHA  PIM/G  oPIM/G    S    S%   SA    SV%  SO
      0    1.0    Toronto Maple Leafs   29.0   6  4  2   0    8  0.667  19  17  0.0  0.0  0.33 -0.01  6.00    11    12   8   18  44.44    4    22   81.82   0    1   10.5     7.5  190  10.0  157  0.892   0
      1    2.0     Montreal Canadiens   28.6   5  3  0   2    8  0.800  24  15  0.0  1.0  0.77 -0.83  7.80    14     8   6   20  30.00    6    25   76.00   4    1   11.4    10.6  180  13.3  140  0.893   0
      2    3.0   Vegas Golden Knights   28.9   5  4  1   0    8  0.800  18  12  0.0  0.0  1.12 -0.08  6.00    15     8   2   18  11.11    3    18   83.33   1    1    7.2     7.2  150  12.0  125  0.904   0
      3    4.0         Minnesota Wild   29.1   5  4  1   0    8  0.800  15  10  0.0  0.0  0.86 -0.14  5.00    13     9   1   23   4.35    1    16   93.75   1    0    7.6    10.4  166   9.0  147  0.932   0
      4    5.0    Washington Capitals   30.1   5  3  0   2    8  0.800  18  16  1.0  1.0  0.10 -0.30  6.80    16    12   2    9  22.22    3    18   83.33   0    1    8.6     5.0  130  13.8  141  0.887   0
      5    6.0    Philadelphia Flyers   27.0   5  3  1   1    7  0.700  19  15  0.0  1.0  0.36 -0.24  6.80    14    10   5   17  29.41    5    18   72.22   0    0    7.2     6.8  125  15.2  187  0.920   1
      6    7.0     Colorado Avalanche   26.9   5  3  2   0    6  0.600  17  12  0.0  0.0  0.47 -0.53  5.80     7     9  10   25  40.00    3    19   84.21   0    0    8.0    10.4  147  11.6  143  0.916   1
      7    8.0          Winnipeg Jets   27.9   4  3  1   0    6  0.750  13  10  0.0  0.0  1.10  0.35  5.75    11     6   2   20  10.00    4    12   66.67   0    0   10.3    14.3  119  10.9  134  0.925   0
      8    9.0     New York Islanders   28.9   4  3  1   0    6  0.750   9   6  0.0  0.0  0.61 -0.14  3.75     5     5   4   20  20.00    1    15   93.33   0    0   11.5    11.0  108   8.3  114  0.947   2
      9   10.0    Tampa Bay Lightning   27.7   3  3  0   0    6  1.000  13   5  0.0  0.0  1.70 -0.97  6.00    11     2   2    8  25.00    3    11   72.73   0    0    9.0     7.0  107  12.1   85  0.941   0
      10  11.0    Pittsburgh Penguins   28.6   5  3  2   0    6  0.600  16  21  2.0  0.0 -0.43  0.17  7.40    10    16   5   18  27.78    5    19   73.68   1    0    7.6     7.2  152  10.5  130  0.838   0
      11  12.0      New Jersey Devils   26.2   4  2  1   1    5  0.625   9  10  0.0  1.0 -0.35  0.15  4.75     8     3   1   11   9.09    6    16   62.50   0    1    9.8     7.3  112   8.0  150  0.933   0
      12  13.0        St. Louis Blues   28.3   4  2  1   1    5  0.625  10  14  0.0  1.0 -1.66 -0.41  6.00    10     6   0   14   0.00    8    21   61.90   0    0   11.0     7.5  109   9.2  129  0.891   0
      13  14.0          Boston Bruins   28.8   4  2  1   1    5  0.625   7   9  2.0  0.0  0.07  0.07  4.00     3     7   3   13  23.08    2    18   88.89   1    0   11.3     8.8  135   5.2   96  0.906   0
      14  15.0        Arizona Coyotes   28.4   5  2  2   1    5  0.500  17  17  0.0  1.0 -0.04  0.16  6.80    11    11   5   22  22.73    5    24   79.17   1    1   10.4     9.6  144  11.8  157  0.892   0
      15  16.0         Calgary Flames   28.1   3  2  0   1    5  0.833  11   6  0.0  0.0  1.14 -0.52  5.67     5     4   6   16  37.50    1    12   91.67   0    1    8.7    11.3   93  11.8   93  0.935   1
      16  17.0        Edmonton Oilers   27.9   6  2  4   0    4  0.333  15  20  0.0  0.0 -0.91 -0.08  5.83    10    14   3   23  13.04    4    18   77.78   2    2    7.7     9.3  192   7.8  200  0.900   0
      17  18.0      Vancouver Canucks   27.3   6  2  4   0    4  0.333  17  28  1.0  0.0 -1.34  0.33  7.50    12    17   4   26  15.38    9    31   70.97   1    2   13.3    10.7  179   9.5  222  0.874   0
      18  19.0          Anaheim Ducks   28.6   5  1  2   2    4  0.400   8  13  0.0  0.0 -0.10  0.90  4.20     8    10   0   12   0.00    2    15   86.67   0    1    6.4     5.2  133   6.0  160  0.919   1
      19  20.0  Columbus Blue Jackets   26.6   5  1  2   2    4  0.400  10  16  0.0  0.0 -1.19  0.01  5.20     9    15   1   11   9.09    1    10   90.00   0    0    9.0     9.4  152   6.6  169  0.905   0
      20  21.0      Los Angeles Kings   28.3   4  1  1   2    4  0.500  12  13  0.0  0.0  0.43  0.68  6.25     8    10   4   17  23.53    3    21   85.71   0    0   11.0     9.0  119  10.1  121  0.893   0
      21  22.0      Detroit Red Wings   29.3   5  2  3   0    4  0.400  10  14  0.0  0.0 -1.54 -0.74  4.80     9     9   1   12   8.33    4    16   75.00   0    1   11.4     9.8  130   7.7  155  0.910   0
      22  23.0        San Jose Sharks   29.4   5  2  3   0    4  0.400  12  18  2.0  0.0 -1.32 -0.52  6.00     7    16   5   21  23.81    2    18   88.89   0    0    8.4     9.6  162   7.4  148  0.878   0
      23  24.0    Carolina Hurricanes   27.0   3  2  1   0    4  0.667   9   6  0.0  0.0  0.26 -0.74  5.00     6     5   3   12  25.00    1     9   88.89   0    0    7.7     9.7   98   9.2   68  0.912   1
      24  25.0       Florida Panthers   27.8   2  2  0   0    4  1.000  10   6  0.0  0.0  1.29 -0.71  8.00     7     3   3    8  37.50    3     5   40.00   0    0    5.0     8.0   66  15.2   66  0.909   0
      25  26.0    Nashville Predators   28.7   4  2  2   0    4  0.500  10  14  0.0  0.0  0.01  1.01  6.00     9     7   1   16   6.25    6    16   62.50   0    1    8.0     8.0  135   7.4  126  0.889   0
      26  27.0         Buffalo Sabres   27.2   5  1  3   1    3  0.300  14  15  0.0  1.0 -0.18  0.22  5.80    11    14   3   17  17.65    1     6   83.33   0    0    3.8     8.2  161   8.7  133  0.887   0
      27  28.0       New York Rangers   25.6   4  1  2   1    3  0.375  11  11  0.0  1.0 -0.15  0.11  5.50     7     7   4   21  19.05    4    16   75.00   0    0    8.5    14.0  140   7.9  112  0.902   1
      28  29.0     Chicago Blackhawks   26.9   5  1  3   1    3  0.300  13  21  0.0  0.0 -0.43  1.17  6.80     5    16   7   17  41.18    5    20   75.00   1    0    8.0     6.8  154   8.4  167  0.874   0
      29  30.0        Ottawa Senators   27.0   4  1  2   1    3  0.375  11  14  0.0  0.0 -0.04  0.71  6.25     8    10   3   18  16.67    4    21   80.95   0    0   14.3    15.3  113   9.7  120  0.883   0
      30  31.0           Dallas Stars   28.8   1  1  0   0    2  1.000   7   0  0.0  0.0  7.30  0.30  7.00     1     0   5    8  62.50    0     5  100.00   1    0   10.0    16.0   28  25.0   34  1.000   1
      31   NaN         League Average   28.0   4  2  2   1    5  0.574  13  13  NaN  NaN   NaN   NaN  5.94     9     9   4   16  21.33    4    16   78.67   0    0    8.0     8.0  133   9.8  133  0.902   0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM