简体   繁体   中英

Pandas Rolling With MultiIndex and GroupBy

I am looking at creating a rolling sum of the past n results for a given id . The index of the DataFrame is the id and date .

The code below works for non-time-based rolling windows, ie integers. However, does not work for time-based intervals, such as '10D' (10 days).

1.

df2['rolling_sum'] = df2.groupby(['id'])['a_column_to_rolling_sum'].apply(lambda x: x.rolling(2, 1).sum().shift())

>> Rolling Result
id               date_dt   
-2143487296      2019-07-08         NaN
                 2019-07-15    0.104478
                 2019-07-19    0.217260
-2143477291      2019-07-05         NaN
                 2019-07-10    0.238764
                 2019-07-16    0.391669
                 2019-07-22    0.255469
                 2019-07-29    0.244011

The code below is able to almost get what I want, however, when a new group is reached, it should be NaN as there cannot be a previous sum.

2.

rolling_result = (
    df2
    .reset_index(level=0)
    .groupby('id')['a_column_to_rolling_sum']
    .rolling('10D', min_periods=1)
    .sum()
    .shift(1)
)

# Add to df
df2['rolling_sum'] = rolling_result

>> Rolling Result
id               date_dt   
-2143487296      2019-07-08         NaN
                 2019-07-15    0.104478
                 2019-07-19    0.217260
-2143477291      2019-07-05    0.229506  <- Why is it not NaN!
                 2019-07-10    0.238764
                 2019-07-16    0.391669
                 2019-07-22    0.255469
                 2019-07-29    0.244011

Image of the two columns where the left is what I want and the right is what I get from 2 .

To recap: I want to group by multiple columns, including an id and date . For each of these groups, I want to create a rolling sum of the previous n days ( '10D' ) and m occurrences (integer value), such that the start of each group is NaN .

Thank you very much!

您应该从rolling中删除min_periods=1参数,这将为您提供您所寻求的(作为参考, min_periods参数记录为“窗口中需要具有值的最小观察数;否则,结果为 np.nan。” )

The problem is occurring during the shift. The rolling result needs to be grouped again, ie

rolling_result = (
    df2
    .reset_index(level=0)
    .groupby('id')['a_column_to_rolling_sum']
    .rolling('10D', min_periods=1)
    .sum()
    .groupby('id')
    .shift(1)
)

# Add to df
df2['rolling_sum'] = rolling_result

This works with multiple groupby arguments too. If you have other indices, remove all bar the date, ie, .reset_index(level=(0,1,...)) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM