简体   繁体   English

Python web 刮板中的 KeyError

[英]Python KeyError in web scraper

I am getting a KeyError: 'title' error in my web scraping program and not sure what the issue is.我在 web 抓取程序中遇到 KeyError: 'title' 错误,我不确定问题出在哪里。 When I use inspect element on the webpage I can see the element that I am trying to find;当我在网页上使用 inspect element 时,我可以看到我要查找的元素;

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.ncaagamesim.com/college-basketball-predictions.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table')

# Get column names
headers = table.find_all('th')
cols = [x.text for x in headers]

# Get all rows in table body
table_rows = table.find_all('tr')

rows = []
# Grab the text of each td, and put into a rows list
for each in table_rows[1:]:
    odd_avail = True
    data = each.find_all('td')
    time = data[0].text.strip()
    try:
        matchup, odds = data[1].text.strip().split('\xa0')
        odd_margin = float(odds.split('by')[-1].strip())
    except:
        matchup = data[1].text.strip()
        odd_margin = '-'
        odd_avail = False
    odd_team_win = data[1].find_all('img')[-1]['title']

    sim_team_win = data[2].find('img')['title']
    sim_margin = float(re.findall("\d+\.\d+", data[2].text)[-1])

    if odd_avail == True:
        if odd_team_win == sim_team_win:
            diff = sim_margin - odd_margin
        else:
            diff = -1 * odd_margin - sim_margin
    else:
        diff = '-'

    row = {cols[0]: time, 'Matchup': matchup, 'Odds Winner': odd_team_win, 'Odds': odd_margin,
           'Simulation Winner': sim_team_win, 'Simulation Margin': sim_margin, 'Diff': diff}
    rows.append(row)

df = pd.DataFrame(rows)
print (df.to_string())
# df.to_csv('odds.csv', index=False)

I am getting the error on setting the sim_team_win line.我在设置 sim_team_win 行时遇到错误。 It is getting data[2] which is the 3rd column on the website and finding the img title to get the team name.它正在获取网站上第 3 列的数据 [2],并找到 img 标题以获取团队名称。 Is it because the img title is within another div?是因为 img 标题在另一个 div 中吗? Also, when running this code it also does not print out the "Odds" column, which is being stored in the odd_margin variable.此外,当运行此代码时,它也不会打印出存储在 odd_margin 变量中的“Odds”列。 Is there something that is wrong when setting that variable?设置该变量时有什么问题吗? Thanks in advance for the help!先谢谢您的帮助!

As far as the not finding the img title, if you look at the row with New Mexico @ Dixie State, there is no image in the third column - no img title in the source either.至于找不到 img 标题,如果您查看带有 New Mexico @ Dixie State 的行,第三列中没有图像 - 源中也没有 img 标题。

For the Odds column, after try/excepting the sim_team_win assignment, I get all the Odds values in the table.对于 Odds 列,在尝试/排除 sim_team_win 分配后,我得到了表中的所有 Odds 值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM