簡體   English   中英

我如何用我的網絡抓取新手代碼修復這個屬性錯誤?

[英]How can I fix this attribute error with my novice webscraping code?

我正在嘗試通過按特定位置抓取 NFL 球員的身高和體重來學習網絡抓取。

這是我的代碼:

import requests
from bs4 import BeautifulSoup

# Base URL for the NFL player stats page
base_url = 'https://www.pro-football-reference.com/players/'

# List to store player data
player_data = []

# Loop through the years 2014 to 2021
for year in range(2014, 2022):
  # Send a GET request to the URL
  response = requests.get(f'{base_url}{year}/')

  # Parse the HTML of the page
  soup = BeautifulSoup(response.text, 'html.parser')

  # Find all rows in the player stats table
  rows = soup.find('table', {'id': 'players'}).tbody.find_all('tr')[1:]
    
 

  # Loop through each row
for row in rows:
    # Find the player name cell
    name_cell = row.find('th')

    # Check if the cell is valid (some rows may not have player data)
if name_cell:
          # Extract the player name and link
    try:
            name = name_cell.a.text
    except AttributeError:
            name = ''
   #35 
    try:
            position = row.find('td', {'data-stat': 'position'}).text
    except AttributeError:
            position = ''
            
    try:
            link = name_cell.a['href']
    except AttributeError:
            link = ''
        
      # Extract the player height and weight
    try:
            height = row.find('td', {'data-stat': 'height'}).text
    except AttributeError:
            height = ''
            
    try:    
                weight = row.find('td', {'data-stat': 'weight'}).text
    except AttributeError:
                weight = ''
        
      # Add the player data to the list
player_data.append({
        'name': name,
        'position': position,
        'link': link,
        'height': height,
        'weight': weight
      })

# Print the player data
print(player_data)
c.execute(player_data)
getAll('player_data',c)
querySave(player_data, c, 'NFLHeightWeight')

print("Done!")


我收到錯誤:AttributeError: 'NoneType' object 沒有屬性 'tbody'

我在其他問題中看到過這個錯誤,但解決方案並沒有真正起作用。

我該如何針對我的特定情況解決此問題? 我試圖確保我正在搜索的內容不為空。

謝謝!

在嘗試查找它的tbody之前檢查您是否找到了該表。

for year in range(2014, 2022):
    # Send a GET request to the URL
    response = requests.get(f'{base_url}{year}/')

    # Parse the HTML of the page
    soup = BeautifulSoup(response.text, 'html.parser')
    table = soup.find('table', {'id': 'players'})
    if table:
        rows = table.tbody.find_all('tr')[1:]
    else:
        print(f"No players table found for year {year}")
        continue

    # rest of loop here

此外, for row in rows:循環需要縮進,因此它位於for year in range(2014, 2022):循環內。 否則它將只使用循環中去年的rows

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM