简体   繁体   中英

Pandas: Take rolling sum of next (1 … n) rows of a column within a group and create a new column for each sum

I have the following dataframe :

a = [1,2,3,4,5,6,7,8]
x1 = ['j','j','j','k','k','k','k','k']
df = pd.DataFrame({'a': a,'b':x1})  

print(df)

a   b
1   j
2   j
3   j
4   k
5   k
6   k
7   k
8   k

I am trying get the sum the "a" values for next n rows grouped within column "b" and store it in new columns (for n ranging from 1 to 4).

Essentially I want to end up with four new columns c1, c2, c3, and c4 such that c1 has sum of "next 1" a's, c2 has sum of "next 2" a's, c3 has sum of "next 3" a's and c4 has sum of "next 4" a's.

Therefore, my desired output is:

a   b   c1      c2      c3      c4  
1   j   2.0     5.0     NaN     NaN
2   j   3.0     NaN     NaN     NaN
3   j   NaN     NaN     NaN     NaN
4   k   5.0     11.0    18.0    26.0
5   k   6.0     13.0    21.0    NaN
6   k   7.0     15.0    NaN     NaN
7   k   8.0     NaN     NaN     NaN
8   k   NaN     NaN     NaN     NaN

I looked for solutions and best I can think of is something like:

for x in range(1,5): 
    df[x] = df.groupby(['b'])a[::-1].rolling(x+1).sum()[::-1] - a

but this syntax throws errors.

If possible, can you also share how to implement if I need to group by more than one fields. Will really appreciate any help.

Thanks.

Your example dataframe doesn't match your expected output, so let's go with the latter.

I think you can combine a rolling sum with a shift:

for x in range(1, 5):
    c = pd.Series(df.groupby("b")["a"].rolling(x).sum().values, index=df.index)
    df[f"c{x}"]= c.groupby(df["b"]).shift(-x)

gives me

In [302]: df
Out[302]: 
   a  b   c1    c2    c3    c4
0  1  j  2.0   5.0   NaN   NaN
1  2  j  3.0   NaN   NaN   NaN
2  3  j  NaN   NaN   NaN   NaN
3  4  k  5.0  11.0  18.0  26.0
4  5  k  6.0  13.0  21.0   NaN
5  6  k  7.0  15.0   NaN   NaN
6  7  k  8.0   NaN   NaN   NaN
7  8  k  NaN   NaN   NaN   NaN

If you really want to have multiple keys, you can use a list of keys, but we have to rearrange the call a little:

keys = ["b","b2"]
for x in range(1, 5):
    c = pd.Series(df.groupby(keys)["a"].rolling(x).sum().values, index=df.index)
    df[f"c{x}"]= c.groupby([df[k] for k in keys]).shift(-x)

or

keys = ["b","b2"]
for x in range(1, 5):
    c = pd.Series(df.groupby(keys)["a"].rolling(x).sum().values, index=df.index)
    df[f"c{x}"]= df.assign(tmp=c).groupby(keys)["tmp"].shift(-x)

give me

In [409]: df
Out[409]: 
   a  b b2   c1    c2  c3  c4
0  1  j  j  2.0   5.0 NaN NaN
1  2  j  j  3.0   NaN NaN NaN
2  3  j  j  NaN   NaN NaN NaN
3  4  k  k  5.0   NaN NaN NaN
4  5  k  k  NaN   NaN NaN NaN
5  6  k  l  7.0  15.0 NaN NaN
6  7  k  l  8.0   NaN NaN NaN
7  8  k  l  NaN   NaN NaN NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM