简体   繁体   English

Pandas Groupby计算ewm无法正常工作

[英]Pandas Groupby with calculating ewm not working as expected

Let's say I have a dataframe like below 假设我有一个如下数据框

import pandas as pd

data = {'team': ['team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1',
              'team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2',],
     'score': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,1,2,3,4,5,6,7,8,9,10,11,12,13,14],
     'yards': [10,20,30,40,50,60,70,80,90,100,110,120,130,140,10,20,30,40,50,60,70,80,90,100,110,120,130,140]}

df = pd.DataFrame.from_dict(data)

I am trying to calculate ewm using this manual method found on this post,( Does Pandas calculate ewm wrong? ), for the 'score' and 'yards' columns, but I notice my span does not work as intended for each grouped team. 我正在尝试使用此职位上的此手动方法( “熊猫计算ewm错误吗?”来计算ewm ,但是我注意到我的跨度不能按每个分组的团队使用。 This is what I have for my code so far 到目前为止,这就是我的代码

ema_features = df[['team']].copy()

for feature_name in df[['score','yards']]:
    span=10
    feature_ema = (df.groupby('team')[feature_name].rolling(window=span, min_periods=span).mean()[:span])
    rest = df[feature_name][span:]
    x = pd.concat([feature_ema, rest]).ewm(span=span, adjust=False).mean()


    ema_features[feature_name] = x

the output of this is as follows 输出如下

ema_features

    team    score   yards
0   team1   NaN NaN
1   team1   NaN NaN
2   team1   NaN NaN
3   team1   NaN NaN
4   team1   NaN NaN
5   team1   NaN NaN
6   team1   NaN NaN
7   team1   NaN NaN
8   team1   NaN NaN
9   team1   NaN NaN
10  team1   6.500000    65.000000
11  team1   7.500000    75.000000
12  team1   8.500000    85.000000
13  team1   9.500000    95.000000
14  team2   7.954545    79.545455
15  team2   6.871901    68.719008
16  team2   6.167919    61.679189
17  team2   5.773752    57.737518
18  team2   5.633070    56.330696
19  team2   5.699784    56.997843
20  team2   5.936187    59.361871
21  team2   6.311426    63.114258
22  team2   6.800257    68.002575
23  team2   7.382029    73.820289
24  team2   8.039842    80.398418
25  team2   8.759871    87.598706
26  team2   9.530803    95.308032
27  team2   10.343384   103.433844

My question is, how do I make my span apply to team 2 as well? 我的问题是,如何使我的跨度也适用于第2队? Rather than the above output where team 2 ewm is calculated with team 1. I would like each team's ewm calculated individually from one another which requires the correct span to be applied and then calculated on, like what I am expecting below. 而不是上面的输出,其中团队2的ewm是与团队1一起计算的。我希望每个团队的ewm相互独立地计算,这需要应用正确的跨度然后进行计算,就像我在下面期望的那样。

   ema_features

        team    score   yards
    0   team1   NaN NaN
    1   team1   NaN NaN
    2   team1   NaN NaN
    3   team1   NaN NaN
    4   team1   NaN NaN
    5   team1   NaN NaN
    6   team1   NaN NaN
    7   team1   NaN NaN
    8   team1   NaN NaN
    9   team1   NaN NaN
    10  team1   6.500000    65.000000
    11  team1   7.500000    75.000000
    12  team1   8.500000    85.000000
    13  team1   9.500000    95.000000
    14  team2   NaN NaN
    15  team2   NaN NaN
    16  team2   NaN NaN
    17  team2   NaN NaN
    18  team2   NaN NaN
    19  team2   NaN NaN
    20  team2   NaN NaN
    21  team2   NaN NaN
    22  team2   NaN NaN
    23  team2   6.500000    65.000000
    24  team2   7.500000    75.000000
    25  team2   8.500000    85.000000
    26  team2   9.500000    95.000000

You could try using GroupBy.apply with a custom function. 您可以尝试通过自定义函数使用GroupBy.apply So adapting your for loop, try something like this: 因此,调整您的for循环,尝试执行以下操作:

def team_ema(team, span=10):
    feature_ema = team.rolling(window=span, min_periods=span).mean()[:span]
    rest = team[span:]
    return pd.concat([feature_ema, rest]).ewm(span=span, adjust=False).mean()

df.groupby('team').apply(team_ema)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM