简体   繁体   中英

Rolling sum on a dynamic window

I am new to python and the last time I coded was in the mid-80's so I appreciate your patient help.

It seems .rolling(window) requires the window to be a fixed integer. I need a rolling window where the window or lookback period is dynamic and given by another column.

In the table below, I seek the Lookbacksum which is the rolling sum of Data as specified by the Lookback column.

d={'Data':[1,1,1,2,3,2,3,2,1,2],
   'Lookback':[0,1,2,2,1,3,3,2,3,1],
   'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)

eg:

   Data  Lookback  LookbackSum
0     1         0            1
1     1         1            2
2     1         2            3
3     2         2            4
4     3         1            5
5     2         3            8
6     3         3           10
7     2         2            7
8     1         3            8
9     2         1            3

You can create a custom function for use with df.apply , eg:

def lookback_window(row, values, lookback, method='sum', *args, **kwargs):
    loc = values.index.get_loc(row.name)
    lb = lookback.loc[row.name]
    return getattr(values.iloc[loc - lb: loc + 1], method)(*args, **kwargs)

Then use it as:

df['new_col'] = df.apply(lookback_window, values=df['Data'], lookback=df['Lookback'], axis=1)

There may be some corner cases but as long as your indices align and are unique - it should fulfil what you're trying to do.

here is one with a list comprehension which stores the index and value of the column df['Lookback'] and the gets the slice by reversing the values and slicing according to the column value:

df['LookbackSum'] = [sum(df.loc[:e,'Data'][::-1].to_numpy()[:i+1]) 
                           for e,i in enumerate(df['Lookback'])]

print(df)

   Data  Lookback  LookbackSum
0     1         0            1
1     1         1            2
2     1         2            3
3     2         2            4
4     3         1            5
5     2         3            8
6     3         3           10
7     2         2            7
8     1         3            8
9     2         1            3

An exercise in pain, if you want to try an almost fully vectorized approach. Sidenote: I don't think it's worth it here. At all. Inspired by Divakar's answer here

Given:

import numpy as np
import pandas as pd
d={'Data':[1,1,1,2,3,2,3,2,1,2],
   'Lookback':[0,1,2,2,1,3,3,2,3,1],
   'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)

Using the function from Divakar's answer, but slightly modified

from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r, fill_value=np.nan):
    # Concatenate with sliced to cover all rolls
    p = np.full((a.shape[0],a.shape[1]-1),fill_value)
    a_ext = np.concatenate((p,a,p),axis=1)

    # Get sliding windows; use advanced-indexing to select appropriate ones
    n = a.shape[1]
    return viewW(a_ext,(1,n))[np.arange(len(r)), -r + (n-1),0]

Now, we just need to prepare a 2d array for the data and independently shift the rows according to our desired lookback values.

arr = df['Data'].to_numpy().reshape(1, -1).repeat(len(df), axis=0)
shifter = np.arange(len(df) - 1, -1, -1) #+ d['Lookback'] - 1
temp = strided_indexing_roll(arr, shifter, fill_value=0)
out = strided_indexing_roll(temp, (len(df) - 1 - df['Lookback'])*-1, 0).sum(-1)

Output:

array([ 1,  2,  3,  4,  5,  8, 10,  7,  8,  3], dtype=int64)

We can then just assign it back to the dataframe as needed and check.

df['out'] = out
#output:
    Data    Lookback    LookbackSum out
0   1   0   1   1
1   1   1   2   2
2   1   2   3   3
3   2   2   4   4
4   3   1   5   5
5   2   3   8   8
6   3   3   10  10
7   2   2   7   7
8   1   3   8   8
9   2   1   3   3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM