Pandas Rolling With MultiIndex and GroupBy

Question

I am looking at creating a rolling sum of the past n results for a given id . The index of the DataFrame is the id and date .

The code below works for non-time-based rolling windows, ie integers. However, does not work for time-based intervals, such as '10D' (10 days).

1.

df2['rolling_sum'] = df2.groupby(['id'])['a_column_to_rolling_sum'].apply(lambda x: x.rolling(2, 1).sum().shift())

>> Rolling Result
id               date_dt   
-2143487296      2019-07-08         NaN
                 2019-07-15    0.104478
                 2019-07-19    0.217260
-2143477291      2019-07-05         NaN
                 2019-07-10    0.238764
                 2019-07-16    0.391669
                 2019-07-22    0.255469
                 2019-07-29    0.244011

The code below is able to almost get what I want, however, when a new group is reached, it should be NaN as there cannot be a previous sum.

2.

rolling_result = (
    df2
    .reset_index(level=0)
    .groupby('id')['a_column_to_rolling_sum']
    .rolling('10D', min_periods=1)
    .sum()
    .shift(1)
)

# Add to df
df2['rolling_sum'] = rolling_result

>> Rolling Result
id               date_dt   
-2143487296      2019-07-08         NaN
                 2019-07-15    0.104478
                 2019-07-19    0.217260
-2143477291      2019-07-05    0.229506  <- Why is it not NaN!
                 2019-07-10    0.238764
                 2019-07-16    0.391669
                 2019-07-22    0.255469
                 2019-07-29    0.244011

Image of the two columns where the left is what I want and the right is what I get from 2 .

To recap: I want to group by multiple columns, including an id and date . For each of these groups, I want to create a rolling sum of the previous n days ( '10D' ) and m occurrences (integer value), such that the start of each group is NaN .

Thank you very much!

Answer 1

您应该从rolling中删除min_periods=1参数，这将为您提供您所寻求的（作为参考， min_periods参数记录为“窗口中需要具有值的最小观察数；否则，结果为 np.nan。” )

Answer 2

The problem is occurring during the shift. The rolling result needs to be grouped again, ie

rolling_result = (
    df2
    .reset_index(level=0)
    .groupby('id')['a_column_to_rolling_sum']
    .rolling('10D', min_periods=1)
    .sum()
    .groupby('id')
    .shift(1)
)

# Add to df
df2['rolling_sum'] = rolling_result

This works with multiple groupby arguments too. If you have other indices, remove all bar the date, ie, .reset_index(level=(0,1,...)) .

Pandas Rolling With MultiIndex and GroupBy

Question

2 answers

solution1
0 2022-07-19 13:54:50

solution2
0 2022-07-20 07:00:47

Pandas Rolling With MultiIndex and GroupBy

Question

2 answers

solution1 0 2022-07-19 13:54:50

solution2 0 2022-07-20 07:00:47

solution1
0 2022-07-19 13:54:50

solution2
0 2022-07-20 07:00:47