[英]Pandas Groupby with calculating ewm not working as expected
假設我有一個如下數據框
import pandas as pd
data = {'team': ['team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1','team1',
'team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2','team2',],
'score': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,1,2,3,4,5,6,7,8,9,10,11,12,13,14],
'yards': [10,20,30,40,50,60,70,80,90,100,110,120,130,140,10,20,30,40,50,60,70,80,90,100,110,120,130,140]}
df = pd.DataFrame.from_dict(data)
我正在嘗試使用此職位上的此手動方法( “熊貓計算ewm錯誤嗎?” ) 來計算ewm ,但是我注意到我的跨度不能按每個分組的團隊使用。 到目前為止,這就是我的代碼
ema_features = df[['team']].copy()
for feature_name in df[['score','yards']]:
span=10
feature_ema = (df.groupby('team')[feature_name].rolling(window=span, min_periods=span).mean()[:span])
rest = df[feature_name][span:]
x = pd.concat([feature_ema, rest]).ewm(span=span, adjust=False).mean()
ema_features[feature_name] = x
輸出如下
ema_features
team score yards
0 team1 NaN NaN
1 team1 NaN NaN
2 team1 NaN NaN
3 team1 NaN NaN
4 team1 NaN NaN
5 team1 NaN NaN
6 team1 NaN NaN
7 team1 NaN NaN
8 team1 NaN NaN
9 team1 NaN NaN
10 team1 6.500000 65.000000
11 team1 7.500000 75.000000
12 team1 8.500000 85.000000
13 team1 9.500000 95.000000
14 team2 7.954545 79.545455
15 team2 6.871901 68.719008
16 team2 6.167919 61.679189
17 team2 5.773752 57.737518
18 team2 5.633070 56.330696
19 team2 5.699784 56.997843
20 team2 5.936187 59.361871
21 team2 6.311426 63.114258
22 team2 6.800257 68.002575
23 team2 7.382029 73.820289
24 team2 8.039842 80.398418
25 team2 8.759871 87.598706
26 team2 9.530803 95.308032
27 team2 10.343384 103.433844
我的問題是,如何使我的跨度也適用於第2隊? 而不是上面的輸出,其中團隊2的ewm是與團隊1一起計算的。我希望每個團隊的ewm相互獨立地計算,這需要應用正確的跨度然后進行計算,就像我在下面期望的那樣。
ema_features
team score yards
0 team1 NaN NaN
1 team1 NaN NaN
2 team1 NaN NaN
3 team1 NaN NaN
4 team1 NaN NaN
5 team1 NaN NaN
6 team1 NaN NaN
7 team1 NaN NaN
8 team1 NaN NaN
9 team1 NaN NaN
10 team1 6.500000 65.000000
11 team1 7.500000 75.000000
12 team1 8.500000 85.000000
13 team1 9.500000 95.000000
14 team2 NaN NaN
15 team2 NaN NaN
16 team2 NaN NaN
17 team2 NaN NaN
18 team2 NaN NaN
19 team2 NaN NaN
20 team2 NaN NaN
21 team2 NaN NaN
22 team2 NaN NaN
23 team2 6.500000 65.000000
24 team2 7.500000 75.000000
25 team2 8.500000 85.000000
26 team2 9.500000 95.000000
您可以嘗試通過自定義函數使用GroupBy.apply
。 因此,調整您的for
循環,嘗試執行以下操作:
def team_ema(team, span=10):
feature_ema = team.rolling(window=span, min_periods=span).mean()[:span]
rest = team[span:]
return pd.concat([feature_ema, rest]).ewm(span=span, adjust=False).mean()
df.groupby('team').apply(team_ema)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.