I'm attempting to calculate a rolling sum of selected columns in my member-month structured healthcare dataset, grouped by member ID but not summing the member IDs, as well as excluding variables like gender from the rolling sum.
For example, using the following toy data:
df=pd.DataFrame({'id':[1,1,1,2,2,2], 'a':[1,2,3,4,5,6], 'b':[10,20,30,40,50,60], 'c':[2,4,6,8,10,12]})
I've successfully calculated the rolling sums by member ID:
df_roll = df.groupby('id')['a','b','c'].rolling(window = 2).sum()
df_roll
so I'm almost there . . . but I haven't been able to select a range of columns as follows:
df_roll = df.groupby('id')['a':'c'].rolling(window = 2).sum()
df_roll
which is important since I have hundreds of columns in my real dataset.
You can use mask. Something like this.
mask=df.iloc[ : , 1: ]
df_roll = df[mask].groupby('id')['a':'c'].rolling(window = 2).sum()
On the other hand, if your range of columns is everything except the groupby column, you can just not give the range at all. Something like this
df_roll = df.groupby('id').rolling(window = 2).sum()
(1) with loc
select the range columns you want to use then (2) groupby
passing df.id and (3) apply de rolling
df.loc[:, 'a':'c'].groupby(df.id).rolling(window = 2).sum() \
.reset_index() \
.drop('level_1', axis = 1)
output:
id a b c
0 1 NaN NaN NaN
1 1 3.0 30.0 6.0
2 1 5.0 50.0 10.0
3 2 NaN NaN NaN
4 2 9.0 90.0 18.0
5 2 11.0 110.0 22.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.