I've just started learning python webscraping and I wanted to learn how to scrape the data from the NFL Website to show all the players and their stats, but I'm given this error with the Beautifulsoup.
import requests
from bs4 import BeautifulSoup
url = "https://www.pro-football-reference.com/years/2021/passing.htm"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
league_table = soup.find('table', class_ = 'per_match_toggle sortable stats_table')
for name in league_table.find_all('tbody'):
rows = name.find_all('tr')
for row in rows:
name = row.find('td', class_ = 'left').text.strip()
yards = row.find_all('td', class_ = 'right')[7].text
touchdowns = row.find_all('td', class_ = 'right')[8].text
print("Name " + name + " Yards " + yards + " Touchdowns " + touchdowns)
Error:
name = row.find('td', class_ = 'left').text.strip()
This happens since find()
can return None
which, obviously, doesn't have an attribute of text
.
This can happen when the element you are searching for doesn't exist or your passing the wrong argument to the search function.
You should wrap the problematic section with a try-except
clause or with an if else
in order to deal with such scenarios
This happens because you will notice right after James Winston, there's a row of headers. So that <tr>
tag is made up of <th>
tags, not <td>
tags. So it gets to that row, and you say .find('td')
, which it does not contain so it returns None
. Then you want to get the text from that, which you get get .text
from None
.
So you'll need to either, like the previous post suggested, utilize a try/except or logic that only takes rows with <td>
tags.
Personally, I'd just use pandas to grab the table, remove that header rows, and iterate through those rows.
import pandas as pd
url = "https://www.pro-football-reference.com/years/2021/passing.htm"
df = pd.read_html(url)[0]
df = df[df['Player'].ne('Player')]
for idx, row in df.iterrows():
name = row['Player']
yards = row['Yds']
touchdowns = row['TD']
print("Name " + name + " Yards " + yards + " Touchdowns " + touchdowns)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.