[英]Pandas rolling over days and getting sum
這是我的 dataframe
d= {'dates': ['2020-07-16','2020-07-15','2020-07-14','2020-07-13','2020-07-16','2020-07-15','2020-07-14','2020-07-13'],
'location':['Paris','Paris','Paris','Paris','NY','NY','NY','NY'],'T':[100,200,300,400,10,20,30,40]}
df = pandas.DataFrame(data=d)
df['dates']=pandas.to_datetime(df['dates'])
df
dates location T
0 2020-07-16 Paris 100
1 2020-07-15 Paris 200
2 2020-07-14 Paris 300
3 2020-07-13 Paris 400
4 2020-07-16 NY 10
5 2020-07-15 NY 20
6 2020-07-14 NY 30
7 2020-07-13 NY 40
我想為過去 2 天(包括當前日期)滾動的給定位置提供一些T
值。 這是我想要的熊貓:
dates location T SUM2D
0 2020-07-16 Paris 100 300
1 2020-07-15 Paris 200 500
2 2020-07-14 Paris 300 700
3 2020-07-13 Paris 400 NaN
4 2020-07-16 NY 10 30
5 2020-07-15 NY 20 50
6 2020-07-14 NY 30 70
7 2020-07-13 NY 4 NaN
我試過玩這句話但沒有成功:
df['SUM2D'] = df.set_index('dates').groupby('location').rolling(window=2, freq='D').sum()['T'].values
嘗試在索引之前對 dataframe 進行排序:
df = df.sort_values(['location','dates']).set_index('dates')
df['SUM2D'] = df.groupby('location')['T'].rolling(window=2, freq='D').sum().values
df[::-1]
結果集:
location T SUM2D
dates
2020-07-16 Paris 100 300.0
2020-07-15 Paris 200 500.0
2020-07-14 Paris 300 700.0
2020-07-13 Paris 400 NaN
2020-07-16 NY 10 30.0
2020-07-15 NY 20 50.0
2020-07-14 NY 30 70.0
2020-07-13 NY 40 NaN
更緊湊和優雅的解決方案是使用transform
:
df['SUM2D'] = df.sort_values(['dates']).groupby('location')['T'].transform(lambda x: x.rolling(2, 2).sum())
結果現在是:
dates location T SUM2D
0 2020-07-16 Paris 100 300.0
1 2020-07-15 Paris 200 500.0
2 2020-07-14 Paris 300 700.0
3 2020-07-13 Paris 400 NaN
4 2020-07-16 NY 10 30.0
5 2020-07-15 NY 20 50.0
6 2020-07-14 NY 30 70.0
7 2020-07-13 NY 40 NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.