简体   繁体   中英

How to Reverse Rolling Sum?

I have a rolling sum calculated on a grouped data frame but its adding up the wrong way, it is a sum of the future, when I need a sum of the past.

What am I doing wrong here?

I import the data and sort by Dimension and Date (I have tried removing the date sort already)

df = pd.read_csv('Input.csv', parse_dates=True)
df.sort_values(['Dimension','Date'])
print(df)

I then create a new column which is a multi index grouped by rolling window

new_column = df.groupby('Dimension').Value1.apply(lambda x: 
x.rolling(window=3).sum())

I then reset the index to be the same as the original

df['Sum_Value1'] = new_column.reset_index(level=0, drop=True)
print(df)

I have also tried reversing the index before the calculation, but that also failed.

Input

Dimension,Date,Value1,Value2
1,4/30/2002,10,20
1,1/31/2002,10,20
1,10/31/2001,10,20
1,7/31/2001,10,20
1,4/30/2001,10,20
1,1/31/2001,10,20
1,10/31/2000,10,20
2,4/30/2002,10,20
2,1/31/2002,10,20
2,10/31/2001,10,20
2,7/31/2001,10,20
2,4/30/2001,10,20
2,1/31/2001,10,20
2,10/31/2000,10,20
3,4/30/2002,10,20
3,1/31/2002,10,20
3,10/31/2001,10,20
3,7/31/2001,10,20
3,1/31/2001,10,20
3,10/31/2000,10,20

Output:

    Dimension        Date  Value1  Value2  Sum_Value1
0           1   4/30/2002      10      20         NaN
1           1   1/31/2002      10      20         NaN
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20        30.0
6           1  10/31/2000      10      20        30.0
7           2   4/30/2002      10      20         NaN
8           2   1/31/2002      10      20         NaN
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20        30.0
13          2  10/31/2000      10      20        30.0

Goal Output:

    Dimension        Date  Value1  Value2  Sum_Value1
0           1   4/30/2002      10      20        30.0
1           1   1/31/2002      10      20        30.0
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20         NaN
6           1  10/31/2000      10      20         NaN
7           2   4/30/2002      10      20        30.0
8           2   1/31/2002      10      20        30.0
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20         Nan
13          2  10/31/2000      10      20         NaN

您需要一个反向总和,因此在总和滚动之前反转您的系列:

lambda x: x[::-1].rolling(window=3).sum()

You can shift the result by window-1 to get the left aligned results:

df["sum_value1"] = (df.groupby('Dimension').Value1
                      .apply(lambda x: x.rolling(window=3).sum().shift(-2)))

在此处输入图片说明

向后滚动与向前滚动然后移动结果相同:

x.rolling(window=3).sum().shift(-2)
def reverse_rolling(series, window, func):
    index = series.index
    series = pd.DataFrame(series.iloc[::-1])
    series = series.rolling(window, 1).apply(func)
    series = series.iloc[::-1]

    series['index'] = index
    series = series.set_index('index')
    return series[0]

You can use

import pandas as pd

from pandas.api.indexers import FixedForwardWindowIndexer

df = pd.read_csv(r'C:\Users\xxxx\python\data.txt')

indexer = FixedForwardWindowIndexer(window_size=3)

df1 = df.join(df.groupby('Dimension')['Value1'].rolling(indexer, min_periods=3).sum().to_frame().reset_index(), rsuffix='_sum')

del df1['Dimension_sum']
del df1['level_1']

df1

Input:

    Dimension        Date  Value1  Value2
0           1   4/30/2002      10      20
1           1   1/31/2002      10      20
2           1  10/31/2001      10      20
3           1   7/31/2001      10      20
4           1   4/30/2001      10      20
5           1   1/31/2001      10      20
6           1  10/31/2000      10      20
7           2   4/30/2002      10      20
8           2   1/31/2002      10      20
9           2  10/31/2001      10      20
10          2   7/31/2001      10      20
11          2   4/30/2001      10      20
12          2   1/31/2001      10      20
13          2  10/31/2000      10      20
14          3   4/30/2002      10      20
15          3   1/31/2002      10      20
16          3  10/31/2001      10      20
17          3   7/31/2001      10      20
18          3   1/31/2001      10      20
19          3  10/31/2000      10      20

OUTPUT:

    Dimension        Date  Value1  Value2  Value1_sum
0           1   4/30/2002      10      20        30.0
1           1   1/31/2002      10      20        30.0
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20         NaN
6           1  10/31/2000      10      20         NaN
7           2   4/30/2002      10      20        30.0
8           2   1/31/2002      10      20        30.0
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20         NaN
13          2  10/31/2000      10      20         NaN
14          3   4/30/2002      10      20        30.0
15          3   1/31/2002      10      20        30.0
16          3  10/31/2001      10      20        30.0
17          3   7/31/2001      10      20        30.0
18          3   1/31/2001      10      20         NaN
19          3  10/31/2000      10      20         NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM