简体   繁体   English

如果 Date < X,则使用 Groupby 计算平均值

[英]Use Groupby to Calculate Average if Date < X

I am trying to use a data frame that includes historical game statistics like the below df1, and build a second data frame that shows what the various column averages were going into each game (as I show in df2).我正在尝试使用一个包含历史游戏统计数据的数据框,如下面的 df1,并构建第二个数据框,显示每个游戏的各种列平均值(如我在 df2 中所示)。 How can I use grouby or something else to find the various averages for each team but only for games that have a date prior to the date in that specific row.我如何使用 grouby 或其他东西来查找每个团队的各种平均值,但仅适用于日期早于该特定行中日期的游戏。 Example of historical games column:历史游戏专栏示例:

Df1    =     Date         Team      Opponent     Points     Points Against   1st Downs      Win?    
             4/16/20      Eagles    Ravens       10         20               10             0
             2/10/20      Eagles    Falcons      30         40               8              0
             12/15/19     Eagles    Cardinals    40         10               7              1
             11/15/19     Eagles    Giants       20         15               5              1
             10/12/19     Jets      Giants       10         18               2              1

Below is the dataframe that i'm trying to create.下面是我正在尝试创建的 dataframe。 As you can see, it is showing the averages for each column but only for the games that happened prior to each game.如您所见,它显示了每列的平均值,但仅显示每场比赛之前发生的比赛。 Note: this is a simplified example of a much larger data set that i'm working with.注意:这是我正在使用的更大数据集的简化示例。 In case the context helps, I'm trying to create this dataframe so I can analyze the correlation between the averages and whether the team won.如果上下文有帮助,我正在尝试创建这个 dataframe 以便我可以分析平均值之间的相关性以及团队是否获胜。

Df2    =     Date         Team      Opponent     Avg Pts    Avg Pts Against  Avg 1st Downs      Win %   
             4/16/20      Eagles    Ravens       25.0       21.3             7.5                75%
             2/10/20      Eagles    Falcons      30.0       12.0             6.0                100%
             12/15/19     Eagles    Cardinals    20.0       15.0             5.0                100%
             11/15/19     Eagles    Giants       NaN        NaN              NaN                NaN               
             10/12/19     Jets      Giants       NaN        NaN              NaN                NaN

Let me know if anything above isn't clear, appreciate the help.如果以上任何内容不清楚,请告诉我,感谢您的帮助。

The easiest way is to turn your dataframe into a Time Series.最简单的方法是将您的 dataframe 变成时间序列。 Run this for a file:运行这个文件:

data=pd.read_csv(r'C:\Users\...csv',index_col='Date',parse_dates=True)

This is an example with a CSV file.这是 CSV 文件的示例。 You can run this after:你可以在之后运行它:

data[:'#The Date you want to have all the dates before it']

If you want build a Series that has time indexed:如果你想建立一个有时间索引的系列:

index=pd.DatetimeIndex(['2014-07-04',...,'2015-08-04'])
data=pd.Series([0, 1, 2, 3], index=index)

Define your own function定义自己的function

def aggs_under_date(df, date):
    first_team = df.Team.iloc[0]
    first_opponent= df.Opponent.iloc[0]

    if df.date.iloc[0] <= date:
        avg_points = df.Points.mean()
        avg_againts = df['Points Against'].mean()
        avg_downs = df['1st Downs'].mean()
        win_perc = f'{win_perc.sum()/win_perc.count()*100} %'

        return [first_team, first_opponent, avg_points, avg_againts, avg_downs, win_perc]
    else:
        return [first_team, first_opponent, np.nan, np.nan, np.nan, np.nan]

And do the groupby applying the function you just defined并通过应用您刚刚定义的groupby进行分组

 date_max = pd.to_datetime('11/15/19')
 Df1.groupby(['Date']).agg(aggs_under_date, date_max)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM