[英]Rolling average over fixed time-window in dataframe
我有一個像這樣的數據框:
import pandas as pd
df = pd.DataFrame({'ID': [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3],
'val': [1,2,3,1,2,3,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3],
'time': [pd.Timestamp(2017, 1, 1, 12), pd.Timestamp(2017, 1, 1, 13), pd.Timestamp(2017, 1, 1, 14), pd.Timestamp(2017, 1, 2, 16), pd.Timestamp(2017, 1, 2, 17), pd.Timestamp(2017, 1, 2, 18), pd.Timestamp(2017, 1, 1, 12), pd.Timestamp(2017, 1, 1, 13), pd.Timestamp(2017, 1, 1, 14), pd.Timestamp(2017, 1, 1, 15), pd.Timestamp(2017, 1, 1, 16), pd.Timestamp(2017, 1, 2, 15), pd.Timestamp(2017, 1, 1, 12), pd.Timestamp(2017, 1, 1, 13), pd.Timestamp(2017, 1, 1, 14), pd.Timestamp(2017, 1, 1, 15), pd.Timestamp(2017, 1, 1, 16), pd.Timestamp(2017, 1, 1, 17), pd.Timestamp(2017, 1, 2, 18), pd.Timestamp(2017, 1, 2, 19), pd.Timestamp(2017, 1, 2, 20)]})
我想創建一個新列,為每一行提供該行time
之前 24 小時窗口內具有相同ID
所有行的val
平均值。
我怎樣才能以pythonic的方式做到這一點? 與迭代每一行相反。
預期輸出:
ID val time 24hr_avg
0 1 1 2017-01-01 12:00:00 1.0 ###
1 1 2 2017-01-01 13:00:00 1.5 ##
2 1 3 2017-01-01 14:00:00 2.0 #
3 1 1 2017-01-02 16:00:00 1.0 ##
4 1 2 2017-01-02 17:00:00 1.5 ##
5 1 3 2017-01-02 18:00:00 2.0 #
6 2 1 2017-01-01 12:00:00 1.0 #####
7 2 2 2017-01-01 13:00:00 1.5 ####
8 2 3 2017-01-01 14:00:00 2.0 ###
9 2 4 2017-01-01 15:00:00 2.5 ###
10 2 5 2017-01-01 16:00:00 3.0 ##
11 2 6 2017-01-02 15:00:00 8.0 #
12 3 1 2017-01-01 12:00:00 1.0 ######
13 3 2 2017-01-01 13:00:00 1.5 #####
14 3 3 2017-01-01 14:00:00 2.0 ####
15 3 4 2017-01-01 15:00:00 2.5 ###
16 3 5 2017-01-01 16:00:00 3.0 ##
17 3 6 2017-01-01 17:00:00 3.5 #
18 3 1 2017-01-02 18:00:00 1.0 ###
19 3 2 2017-01-02 19:00:00 1.5 ##
20 3 3 2017-01-02 20:00:00 2.0 #
如果您set_index
時間列,那么您可以使用groupby.rolling
窗口為 24 小時。 然后與原始數據merge
:
df_ = df.merge(df.set_index('time').sort_index()
.groupby('ID')
.rolling('24H')
['val'].mean()
.rename('24hr_avg'),
left_on=['ID', 'time'], right_index=True)
print(df_)
ID val time 24hr_avg
0 1 1 2017-01-01 12:00:00 1.0
1 1 2 2017-01-01 13:00:00 1.5
2 1 3 2017-01-01 14:00:00 2.0
3 1 1 2017-01-02 16:00:00 1.0
4 1 2 2017-01-02 17:00:00 1.5
5 1 3 2017-01-02 18:00:00 2.0
6 2 1 2017-01-01 12:00:00 1.0
7 2 2 2017-01-01 13:00:00 1.5
8 2 3 2017-01-01 14:00:00 2.0
9 2 4 2017-01-01 15:00:00 2.5
10 2 5 2017-01-01 16:00:00 3.0
11 2 6 2017-01-02 15:00:00 5.5
12 3 1 2017-01-01 12:00:00 1.0
13 3 2 2017-01-01 13:00:00 1.5
14 3 3 2017-01-01 14:00:00 2.0
15 3 4 2017-01-01 15:00:00 2.5
16 3 5 2017-01-01 16:00:00 3.0
17 3 6 2017-01-01 17:00:00 3.5
18 3 1 2017-01-02 18:00:00 1.0
19 3 2 2017-01-02 19:00:00 1.5
20 3 3 2017-01-02 20:00:00 2.0
我們可以使用Groupby.rolling
:
df['24hr_avg'] = (
df.set_index('time')
.groupby('ID', sort=False)['val']
.rolling('1D')
.mean()
.to_numpy()
)
ID val time 24hr_avg
0 1 1 2017-01-01 12:00:00 1.0
1 1 2 2017-01-01 13:00:00 1.5
2 1 3 2017-01-01 14:00:00 2.0
3 1 1 2017-01-02 16:00:00 1.0
4 1 2 2017-01-02 17:00:00 1.5
5 1 3 2017-01-02 18:00:00 2.0
6 2 1 2017-01-01 12:00:00 1.0
7 2 2 2017-01-01 13:00:00 1.5
8 2 3 2017-01-01 14:00:00 2.0
9 2 4 2017-01-01 15:00:00 2.5
10 2 5 2017-01-01 16:00:00 3.0
11 2 6 2017-01-02 15:00:00 5.5
12 3 1 2017-01-01 12:00:00 1.0
13 3 2 2017-01-01 13:00:00 1.5
14 3 3 2017-01-01 14:00:00 2.0
15 3 4 2017-01-01 15:00:00 2.5
16 3 5 2017-01-01 16:00:00 3.0
17 3 6 2017-01-01 17:00:00 3.5
18 3 1 2017-01-02 18:00:00 1.0
19 3 2 2017-01-02 19:00:00 1.5
20 3 3 2017-01-02 20:00:00 2.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.