[英]Pandas Groupby with calculating ewm not working as expected
Let's say I have a dataframe like below 假设我有一个如下数据框
import pandas as pd
data = {'team': ['team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1',
'team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2',],
'score': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,1,2,3,4,5,6,7,8,9,10,11,12,13,14],
'yards': [10,20,30,40,50,60,70,80,90,100,110,120,130,140,10,20,30,40,50,60,70,80,90,100,110,120,130,140]}
df = pd.DataFrame.from_dict(data)
I am trying to calculate ewm using this manual method found on this post,( Does Pandas calculate ewm wrong? ), for the 'score' and 'yards' columns, but I notice my span does not work as intended for each grouped team. 我正在尝试使用此职位上的此手动方法( “熊猫计算ewm错误吗?” ) 来计算ewm ,但是我注意到我的跨度不能按每个分组的团队使用。 This is what I have for my code so far
到目前为止,这就是我的代码
ema_features = df[['team']].copy()
for feature_name in df[['score','yards']]:
span=10
feature_ema = (df.groupby('team')[feature_name].rolling(window=span, min_periods=span).mean()[:span])
rest = df[feature_name][span:]
x = pd.concat([feature_ema, rest]).ewm(span=span, adjust=False).mean()
ema_features[feature_name] = x
the output of this is as follows 输出如下
ema_features
team score yards
0 team1 NaN NaN
1 team1 NaN NaN
2 team1 NaN NaN
3 team1 NaN NaN
4 team1 NaN NaN
5 team1 NaN NaN
6 team1 NaN NaN
7 team1 NaN NaN
8 team1 NaN NaN
9 team1 NaN NaN
10 team1 6.500000 65.000000
11 team1 7.500000 75.000000
12 team1 8.500000 85.000000
13 team1 9.500000 95.000000
14 team2 7.954545 79.545455
15 team2 6.871901 68.719008
16 team2 6.167919 61.679189
17 team2 5.773752 57.737518
18 team2 5.633070 56.330696
19 team2 5.699784 56.997843
20 team2 5.936187 59.361871
21 team2 6.311426 63.114258
22 team2 6.800257 68.002575
23 team2 7.382029 73.820289
24 team2 8.039842 80.398418
25 team2 8.759871 87.598706
26 team2 9.530803 95.308032
27 team2 10.343384 103.433844
My question is, how do I make my span apply to team 2 as well? 我的问题是,如何使我的跨度也适用于第2队? Rather than the above output where team 2 ewm is calculated with team 1. I would like each team's ewm calculated individually from one another which requires the correct span to be applied and then calculated on, like what I am expecting below.
而不是上面的输出,其中团队2的ewm是与团队1一起计算的。我希望每个团队的ewm相互独立地计算,这需要应用正确的跨度然后进行计算,就像我在下面期望的那样。
ema_features
team score yards
0 team1 NaN NaN
1 team1 NaN NaN
2 team1 NaN NaN
3 team1 NaN NaN
4 team1 NaN NaN
5 team1 NaN NaN
6 team1 NaN NaN
7 team1 NaN NaN
8 team1 NaN NaN
9 team1 NaN NaN
10 team1 6.500000 65.000000
11 team1 7.500000 75.000000
12 team1 8.500000 85.000000
13 team1 9.500000 95.000000
14 team2 NaN NaN
15 team2 NaN NaN
16 team2 NaN NaN
17 team2 NaN NaN
18 team2 NaN NaN
19 team2 NaN NaN
20 team2 NaN NaN
21 team2 NaN NaN
22 team2 NaN NaN
23 team2 6.500000 65.000000
24 team2 7.500000 75.000000
25 team2 8.500000 85.000000
26 team2 9.500000 95.000000
You could try using GroupBy.apply
with a custom function. 您可以尝试通过自定义函数使用
GroupBy.apply
。 So adapting your for
loop, try something like this: 因此,调整您的
for
循环,尝试执行以下操作:
def team_ema(team, span=10):
feature_ema = team.rolling(window=span, min_periods=span).mean()[:span]
rest = team[span:]
return pd.concat([feature_ema, rest]).ewm(span=span, adjust=False).mean()
df.groupby('team').apply(team_ema)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.