简体   繁体   中英

How to calculate a win streak in Python/Pandas

I'm trying to calculate the win-streak or losing-streak going into a game. My goal is to generate a betting decision based on these streak factors or a recent record. I am new to Python and Pandas (and programming in general), so any detailed explanation of what code does would be welcome.

Here's my data

    Season               Game Date                   Game Index  Away Team               Away Score  Home Team             Home Score  Winner                Loser
 0  2014 Regular Season  Saturday, March 22, 2014    2014032201  Los Angeles Dodgers              3  Arizona D'Backs                1  Los Angeles Dodgers   Arizona D'Backs
 1  2014 Regular Season  Sunday, March 23, 2014      2014032301  Los Angeles Dodgers              7  Arizona D'Backs                5  Los Angeles Dodgers   Arizona D'Backs
 2  2014 Regular Season  Sunday, March 30, 2014      2014033001  Los Angeles Dodgers              1  San Diego Padres               3  San Diego Padres      Los Angeles Dodgers
 3  2014 Regular Season  Monday, March 31, 2014      2014033101  Seattle Mariners                10  Los Angeles Angels             3  Seattle Mariners      Los Angeles Angels
 4  2014 Regular Season  Monday, March 31, 2014      2014033102  San Francisco Giants             9  Arizona D'Backs                8  San Francisco Giants  Arizona D'Backs
 5  2014 Regular Season  Monday, March 31, 2014      2014033103  Boston Red Sox                   1  Baltimore Orioles              2  Baltimore Orioles     Boston Red Sox
 6  2014 Regular Season  Monday, March 31, 2014      2014033104  Minnesota Twins                  3  Chicago White Sox              5  Chicago White Sox     Minnesota Twins
 7  2014 Regular Season  Monday, March 31, 2014      2014033105  St. Louis Cardinals              1  Cincinnati Reds                0  St. Louis Cardinals   Cincinnati Reds
 8  2014 Regular Season  Monday, March 31, 2014      2014033106  Kansas City Royals               3  Detroit Tigers                 4  Detroit Tigers        Kansas City Royals
 9  2014 Regular Season  Monday, March 31, 2014      2014033107  Colorado Rockies                 1  Miami Marlins                 10  Miami Marlins         Colorado Rockies

Dictionary below:

{'Away Score': {0: 3, 1: 7, 2: 1, 3: 10, 4: 9},
 'Away Team': {0: 'Los Angeles Dodgers',
  1: 'Los Angeles Dodgers',
  2: 'Los Angeles Dodgers',
  3: 'Seattle Mariners',
  4: 'San Francisco Giants'},
 'Game Date': {0: 'Saturday, March 22, 2014',
  1: 'Sunday, March 23, 2014',
  2: 'Sunday, March 30, 2014',
  3: 'Monday, March 31, 2014',
  4: 'Monday, March 31, 2014'},
 'Game Index': {0: 2014032201,
  1: 2014032301,
  2: 2014033001,
  3: 2014033101,
  4: 2014033102},
 'Home Score': {0: 1, 1: 5, 2: 3, 3: 3, 4: 8},
 'Home Team': {0: "Arizona D'Backs",
  1: "Arizona D'Backs",
  2: 'San Diego Padres',
  3: 'Los Angeles Angels',
  4: "Arizona D'Backs"},
 'Loser': {0: "Arizona D'Backs",
  1: "Arizona D'Backs",
  2: 'Los Angeles Dodgers',
  3: 'Los Angeles Angels',
  4: "Arizona D'Backs"},
 'Season': {0: '2014 Regular Season',
  1: '2014 Regular Season',
  2: '2014 Regular Season',
  3: '2014 Regular Season',
  4: '2014 Regular Season'},
 'Winner': {0: 'Los Angeles Dodgers',
  1: 'Los Angeles Dodgers',
  2: 'San Diego Padres',
  3: 'Seattle Mariners',
  4: 'San Francisco Giants'}}

I've tried looping through the season and the team, and then creating a streak count based on [this]: https://github.com/nhcamp/EPL-Betting/blob/master/EPL%20Match%20Results%20DF.ipynb github project.

I run into key errors early in building my loops, and I have trouble identifying data

game_table = pd.read_csv('MLB_Scores_2014_2018.csv')

# Get Team List
team_list = game_table['Away Team'].unique()

# Get Season List
season_list = game_table['Season'].unique()

#Defining "chunks" to append gamedata to the total dataframe
chunks = []

for season in season_list:
    # Looping through seasons. Streaks reset for each season
    season_games = game_table[game_table['Season'] == season]

    for team in team_list:
        # Looping through teams
        season_team_games = season_games[(season_games['Away Team'] == team | season_games['Home Team'] == team)]

        #Setting streak list and streak counter values
        streak_list = []
        streak = 0

        # Looping through each game
        for game in season_team_games.iterrow():
            # Check if team is a winner, and up the streak
            if game_table['Winner'] == team:
                streak_list.append(streak)
                streak += 1
            # If not the winner, append streak and set to zero
            elif game_table['Winner'] != team:
                streak_list.append(streak)
                streak = 0
            # Just in case something wierd happens with the scores
            else:
                streak_list.append(streak)
        game_table['Streak'] = streak_list
        chunk_list.append(game_table)

And that's kind of where I lose it. How do I append separately if each team is the home team or the away team? Is there a better way to display this data?

As a general matter, I want to add a win-streak and/or losing-streak for each team in each game. Headers would look like this:

| Season | Game Date | Game Index | Away Team | Away Score | Home Team | Home Score | Winner | Loser | Away Win Streak | Away Lose Streak | Home Win Streak | Home Lose Streak |

Edit: this error message has been resolved

I also get an error creating the dataframe 'season_team_games."

TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]

The error you are seeing come from the statement

season_team_games = season_games[(season_games['Away Team'] == team | season_games['Home Team'] == team)]

When you're adding two boolean conditions, you need to separate them out with parentheses. This is because the | operator takes precedence over the == operator. So this should become:

season_team_games = season_games[(season_games['Away Team'] == team) | (season_games['Home Team'] == team)]

I know there is more to the question than this error, but as mentioned in the comment, once you provide some text based data, it might be easier to help

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM