简体   繁体   中英

How to select a range of columns with pandas rolling window function?

I'm attempting to calculate a rolling sum of selected columns in my member-month structured healthcare dataset, grouped by member ID but not summing the member IDs, as well as excluding variables like gender from the rolling sum.

For example, using the following toy data:

df=pd.DataFrame({'id':[1,1,1,2,2,2], 'a':[1,2,3,4,5,6], 'b':[10,20,30,40,50,60], 'c':[2,4,6,8,10,12]})

I've successfully calculated the rolling sums by member ID:

df_roll = df.groupby('id')['a','b','c'].rolling(window = 2).sum()
df_roll

so I'm almost there . . . but I haven't been able to select a range of columns as follows:

df_roll = df.groupby('id')['a':'c'].rolling(window = 2).sum()
df_roll

which is important since I have hundreds of columns in my real dataset.

You can use mask. Something like this.

mask=df.iloc[ : , 1: ]
df_roll = df[mask].groupby('id')['a':'c'].rolling(window = 2).sum()

On the other hand, if your range of columns is everything except the groupby column, you can just not give the range at all. Something like this

df_roll = df.groupby('id').rolling(window = 2).sum()

(1) with loc select the range columns you want to use then (2) groupby passing df.id and (3) apply de rolling

df.loc[:, 'a':'c'].groupby(df.id).rolling(window = 2).sum() \
                                 .reset_index() \
                                 .drop('level_1', axis = 1)

output:
    id  a    b      c
0   1   NaN  NaN    NaN
1   1   3.0  30.0   6.0
2   1   5.0  50.0   10.0
3   2   NaN  NaN    NaN
4   2   9.0  90.0   18.0
5   2   11.0 110.0  22.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM