[英]Applying multiple rolling functions to multiple columns of a pandas groupby rolling object?
我正在尋找以下方面:
分組數據框
對於每個組,生成時間窗口(給定時間單位)
在結果結構中,采用每一列並應用多個滾動摘要統計功能,以便結果具有針對每個組/時間窗口組合的摘要統計。
這是一個示例數據集:
gps_time,name,val_x,val_y
2017-07-04 11:20:23.423,bob,0.963,0.201
2017-07-04 11:20:24.492,bob,0.964,0.203
2017-07-04 11:20:24.499,bob,0.962,0.210
2017-07-04 11:20:25.627,sarah,0.893,0.010
2017-07-04 11:20:28.627,sarah,0.894,0.012
2017-07-04 11:20:29.613,sarah,0.895,0.014
2017-07-04 11:20:29.630,larry,-0.423,0.231
2017-07-04 11:20:30.423,larry,-0.431,0.22
2017-07-04 11:20:30.428,larry,-0.432,0.222
以上數據的期望輸出,按名稱分組並以1秒為窗口:
name,gps_time,val_x_mean,val_x_med,val_y_mean,val_y_med
bob,2017-07-04 11:20:23.423,0.963,0.963,0.201,0.201
bob,2017-07-04 11:20:24.492,0.963,0.963,0.2065,0.2065
sarah,2017-07-04 11:20:25.627,0.893,0.89,0.010,0.010
sarah,2017-07-04 11:20:28.627,0.8945,0.8945,0.013,0.013
larry,2017-07-04 11:20:30.423,-0.4287,-0.431,0.336,0.222
我嘗試使用列表推導來生成一堆數據幀,但過程確實很慢,我必須為每一列調用它。
讓我們將groupby
與pd.Grouper
一起pd.Grouper
:
df_out = df.groupby([pd.Grouper(freq='S', key='gps_time'),'name']).agg(['mean','median'])
df_out.columns = df_out.columns.map('_'.join)
df_out.reset_index()
輸出:
gps_time name val_x_mean val_x_median val_y_mean \
0 2017-07-04 11:20:23 bob 0.9630 0.9630 0.2010
1 2017-07-04 11:20:24 bob 0.9630 0.9630 0.2065
2 2017-07-04 11:20:25 sarah 0.8930 0.8930 0.0100
3 2017-07-04 11:20:28 sarah 0.8940 0.8940 0.0120
4 2017-07-04 11:20:29 larry -0.4230 -0.4230 0.2310
5 2017-07-04 11:20:29 sarah 0.8950 0.8950 0.0140
6 2017-07-04 11:20:30 larry -0.4315 -0.4315 0.2210
val_y_median
0 0.2010
1 0.2065
2 0.0100
3 0.0120
4 0.2310
5 0.0140
6 0.2210
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.