简体   繁体   中英

'NoneType' object has no attribute 'text' | Beautifulsoup

I've just started learning python webscraping and I wanted to learn how to scrape the data from the NFL Website to show all the players and their stats, but I'm given this error with the Beautifulsoup.

import requests
from bs4 import BeautifulSoup

url = "https://www.pro-football-reference.com/years/2021/passing.htm"

r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

league_table = soup.find('table', class_ = 'per_match_toggle sortable stats_table')

for name in league_table.find_all('tbody'):
    rows = name.find_all('tr')
    for row in rows:
        name = row.find('td', class_ = 'left').text.strip()
        yards = row.find_all('td', class_ = 'right')[7].text
        touchdowns = row.find_all('td', class_ = 'right')[8].text
        print("Name " + name + " Yards " + yards +  " Touchdowns " + touchdowns)

Error:

name = row.find('td', class_ = 'left').text.strip()

This happens since find() can return None which, obviously, doesn't have an attribute of text .

This can happen when the element you are searching for doesn't exist or your passing the wrong argument to the search function.

You should wrap the problematic section with a try-except clause or with an if else in order to deal with such scenarios

This happens because you will notice right after James Winston, there's a row of headers. So that <tr> tag is made up of <th> tags, not <td> tags. So it gets to that row, and you say .find('td') , which it does not contain so it returns None . Then you want to get the text from that, which you get get .text from None .

So you'll need to either, like the previous post suggested, utilize a try/except or logic that only takes rows with <td> tags.

Personally, I'd just use pandas to grab the table, remove that header rows, and iterate through those rows.

import pandas as pd

url = "https://www.pro-football-reference.com/years/2021/passing.htm"
df = pd.read_html(url)[0]
df = df[df['Player'].ne('Player')]

for idx, row in df.iterrows():
    name = row['Player']
    yards = row['Yds']
    touchdowns = row['TD']
    print("Name " + name + " Yards " + yards +  " Touchdowns " + touchdowns)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM