I have the following dataframe :
a = [1,2,3,4,5,6,7,8]
x1 = ['j','j','j','k','k','k','k','k']
df = pd.DataFrame({'a': a,'b':x1})
print(df)
a b
1 j
2 j
3 j
4 k
5 k
6 k
7 k
8 k
I am trying get the sum the "a" values for next n rows grouped within column "b" and store it in new columns (for n ranging from 1 to 4).
Essentially I want to end up with four new columns c1, c2, c3, and c4 such that c1 has sum of "next 1" a's, c2 has sum of "next 2" a's, c3 has sum of "next 3" a's and c4 has sum of "next 4" a's.
Therefore, my desired output is:
a b c1 c2 c3 c4
1 j 2.0 5.0 NaN NaN
2 j 3.0 NaN NaN NaN
3 j NaN NaN NaN NaN
4 k 5.0 11.0 18.0 26.0
5 k 6.0 13.0 21.0 NaN
6 k 7.0 15.0 NaN NaN
7 k 8.0 NaN NaN NaN
8 k NaN NaN NaN NaN
I looked for solutions and best I can think of is something like:
for x in range(1,5):
df[x] = df.groupby(['b'])a[::-1].rolling(x+1).sum()[::-1] - a
but this syntax throws errors.
If possible, can you also share how to implement if I need to group by more than one fields. Will really appreciate any help.
Thanks.
Your example dataframe doesn't match your expected output, so let's go with the latter.
I think you can combine a rolling sum with a shift:
for x in range(1, 5):
c = pd.Series(df.groupby("b")["a"].rolling(x).sum().values, index=df.index)
df[f"c{x}"]= c.groupby(df["b"]).shift(-x)
gives me
In [302]: df
Out[302]:
a b c1 c2 c3 c4
0 1 j 2.0 5.0 NaN NaN
1 2 j 3.0 NaN NaN NaN
2 3 j NaN NaN NaN NaN
3 4 k 5.0 11.0 18.0 26.0
4 5 k 6.0 13.0 21.0 NaN
5 6 k 7.0 15.0 NaN NaN
6 7 k 8.0 NaN NaN NaN
7 8 k NaN NaN NaN NaN
If you really want to have multiple keys, you can use a list of keys, but we have to rearrange the call a little:
keys = ["b","b2"]
for x in range(1, 5):
c = pd.Series(df.groupby(keys)["a"].rolling(x).sum().values, index=df.index)
df[f"c{x}"]= c.groupby([df[k] for k in keys]).shift(-x)
or
keys = ["b","b2"]
for x in range(1, 5):
c = pd.Series(df.groupby(keys)["a"].rolling(x).sum().values, index=df.index)
df[f"c{x}"]= df.assign(tmp=c).groupby(keys)["tmp"].shift(-x)
give me
In [409]: df
Out[409]:
a b b2 c1 c2 c3 c4
0 1 j j 2.0 5.0 NaN NaN
1 2 j j 3.0 NaN NaN NaN
2 3 j j NaN NaN NaN NaN
3 4 k k 5.0 NaN NaN NaN
4 5 k k NaN NaN NaN NaN
5 6 k l 7.0 15.0 NaN NaN
6 7 k l 8.0 NaN NaN NaN
7 8 k l NaN NaN NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.