简体   繁体   English

用每个独特团队获胜的顺序计数器创建列表

[英]Create list with sequential counter of each unique team's wins

Say I have a dataset that contains the home_team, away_team and columns home_win, away_win that tells which team won the game. 假设我有一个数据集,其中包含home_team,away_team和home_win,away_win列,该数据告诉哪个团队赢得了比赛。 Like this: 像这样:

Home_team     Away_Team     Home_Win     Away_Win    gameID
   TB            CLB            1            0         1
   NY            ARZ            0            1         2
   EDM           CAN            1            0         3
   NY            TB             0            1         4
   NY            CLB            1            0         5
   TB            NY             1            0         6

How do you write a sequential counter that counts a Teams Total Wins with respect for previous games and irrespective if the Team was Home or Away. 您如何编写顺序计数器来计算团队对过去比赛的总获胜次数,而不论团队是主队还是客队。 So for gameID:1, each team has a total of 0 total wins. 因此,对于gameID:1,每支球队共有0场总胜利。 Since TB won the first game they now have a total of 1 wins coming up to their second game agains NY(gameID:4) and NY has a total of 0 previous wins. 自从TB赢得第一场比赛以来,他们现在总共获得了1场胜利,而第二场比赛又是NY(gameID:4),而NY之前共有0场胜利。

So the data would look like this: (AT=Away_Team, HT=Home_Team) 因此数据看起来像这样:(AT = Away_Team,HT = Home_Team)

Home_team     Away_Team     Home_Win     Away_Win    gameID    HT'sTotWins      AT'sTotWins
   TB            CLB            1            0         1            0               0
   NY            ARZ            0            1         2            0               0
   EDM           CAN            1            0         3            0               0
   NY            TB             0            1         4            0               1
   NY            CLB            1            0         5            0               0
   TB            NY             1            0         6            2               1

I've read some about GroupBy.cumcount() , but I don't know how to write the conditions. 我已经阅读了一些有关GroupBy.cumcount() ,但是我不知道如何编写条件。 I hope I'm not to unclear about what I want to do, if I am please tell me. 我希望我不要不清楚我想做什么,请告诉我。

To be more instructive, I extended your source data to 10 games and "shortened" column names to make the printout not so wide. 为了提供更多指导,我将源数据扩展到10个游戏,并“缩短了”列名,以使打印输出不那么宽。

So the first part of the script, generating source DataFrame is as follows: 因此,脚本的第一部分生成源DataFrame如下:

import pandas as pd

# Source data
df = pd.DataFrame(data=[
    [ 1, 'TB',  'CLB', 1], [ 2, 'NY',  'ARZ', 0],
    [ 3, 'EDM', 'CAN', 1], [ 4, 'NY',  'TB',  0],
    [ 5, 'NY',  'CLB', 1], [ 6, 'TB',  'NY',  1],
    [ 7, 'ARZ', 'CAN', 1], [ 8, 'ARZ', 'TB',  0],
    [ 9, 'NY',  'EDM', 1], [10, 'TB',  'CAN', 1]],
    columns=['gameID', 'HomeTeam', 'AwayTeam', 'HomeWin']).set_index('gameID')
df['AwayWin'] = 1 - df['HomeWin']

Because winning team can be in both HomeTeam and AwayTeam , there is no simple way to use a single groupby . 因为获胜的团队可以同时在HomeTeamAwayTeam ,所以没有简单的方法来使用单个groupby You have to use it twice, generating each result column. 您必须使用两次,生成每个结果列。

To generate HTWins (Home Teams's Total wins), use: 要生成HTWins (主队的总胜利数),请使用:

hWin = df.HomeTeam.where(df.HomeWin == 1, df.AwayTeam)
hCnt = hWin.groupby(hWin).cumcount()
df['HTWins'] = hCnt.where(df.HomeWin == 1, 0)

And to generate ATWins (Away Teams's Total wins), use: 并生成ATWins (客队的总胜利),请使用:

aWin = df.AwayTeam.where(df.AwayWin == 1, df.HomeTeam)
aCnt = aWin.groupby(aWin).cumcount()
df['ATWins'] = aCnt.where(df.AwayWin == 1, 0)

When you print(df) , you will get: 当您print(df) ,您将获得:

       HomeTeam AwayTeam  HomeWin  AwayWin  HTWins  ATWins
gameID                                                    
1            TB      CLB        1        0       0       0
2            NY      ARZ        0        1       0       0
3           EDM      CAN        1        0       0       0
4            NY       TB        0        1       0       1
5            NY      CLB        1        0       0       0
6            TB       NY        1        0       2       0
7           ARZ      CAN        1        0       1       0
8           ARZ       TB        0        1       0       3
9            NY      EDM        1        0       1       0
10           TB      CAN        1        0       4       0

To assist in understanding how this script works, run each instruction separately and print the result. 为了帮助理解此脚本的工作原理,请分别运行每条指令并打印结果。

There might be a more "elegant" pandas way of doing this, but I would just break things into for loops and go that way. 这样做可能会有更“优雅”的熊猫方式,但是我只是将事情分解成for循环然后那样做。

import copy
import pandas as pd

df = pd.read_csv('sports_data.csv', header=0, delim_whitespace=True)
df["HT'sTotWins"] = 0
df["AT'sTotWins"] = 0

homeWinsAwayWins = {}
homeAwayCount = {'home':0, 'away':0}

for index, row in df.iterrows():
    homeTeam = row['Home_team']
    awayTeam = row['Away_Team']

    if homeTeam not in homeWinsAwayWins:
        homeWinsAwayWins[homeTeam] = copy.deepcopy(homeAwayCount)
    if awayTeam not in homeWinsAwayWins:
        homeWinsAwayWins[awayTeam] = copy.deepcopy(homeAwayCount)

    df.loc[index,"HT'sTotWins"] = homeWinsAwayWins[homeTeam]['home'] + homeWinsAwayWins[homeTeam]['away']
    df.loc[index,"AT'sTotWins"] = homeWinsAwayWins[awayTeam]['home'] + homeWinsAwayWins[awayTeam]['away']

    homeWin = row['Home_Win']
    awayWin = row['Away_Win']
    if homeWin:
        homeWinsAwayWins[homeTeam]['home'] += 1
    elif awayWin:
        homeWinsAwayWins[awayTeam]['away'] += 1

print(df)

It prints what you want. 它打印您想要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM