简体   繁体   English

熊猫:对组中列的下一行(1…n)行进行滚动求和,并为每个和创建一个新列

[英]Pandas: Take rolling sum of next (1 … n) rows of a column within a group and create a new column for each sum

I have the following dataframe : 我有以下数据框:

a = [1,2,3,4,5,6,7,8]
x1 = ['j','j','j','k','k','k','k','k']
df = pd.DataFrame({'a': a,'b':x1})  

print(df)

a   b
1   j
2   j
3   j
4   k
5   k
6   k
7   k
8   k

I am trying get the sum the "a" values for next n rows grouped within column "b" and store it in new columns (for n ranging from 1 to 4). 我正在尝试获取列在“ b”列中的下n行的“ a”值的总和,并将其存储在新列中(n介于1到4之间)。

Essentially I want to end up with four new columns c1, c2, c3, and c4 such that c1 has sum of "next 1" a's, c2 has sum of "next 2" a's, c3 has sum of "next 3" a's and c4 has sum of "next 4" a's. 本质上,我想结束四个新列c1,c2,c3和c4,以使c1的总和为“ next 1” a,c2的总和为“ next 2” a,c3的总和为“ next 3” a。 c4的总和为“下一个4” a。

Therefore, my desired output is: 因此,我想要的输出是:

a   b   c1      c2      c3      c4  
1   j   2.0     5.0     NaN     NaN
2   j   3.0     NaN     NaN     NaN
3   j   NaN     NaN     NaN     NaN
4   k   5.0     11.0    18.0    26.0
5   k   6.0     13.0    21.0    NaN
6   k   7.0     15.0    NaN     NaN
7   k   8.0     NaN     NaN     NaN
8   k   NaN     NaN     NaN     NaN

I looked for solutions and best I can think of is something like: 我在寻找解决方案,我能想到的最好的方法是:

for x in range(1,5): 
    df[x] = df.groupby(['b'])a[::-1].rolling(x+1).sum()[::-1] - a

but this syntax throws errors. 但是这种语法会引发错误。

If possible, can you also share how to implement if I need to group by more than one fields. 如果可能的话,如果我需要按多个字段分组,也可以分享如何实现。 Will really appreciate any help. 会非常感谢您的帮助。

Thanks. 谢谢。

Your example dataframe doesn't match your expected output, so let's go with the latter. 您的示例数据框与预期的输出不匹配,因此让我们来看一下后者。

I think you can combine a rolling sum with a shift: 我认为您可以将滚动总和与班次结合起来:

for x in range(1, 5):
    c = pd.Series(df.groupby("b")["a"].rolling(x).sum().values, index=df.index)
    df[f"c{x}"]= c.groupby(df["b"]).shift(-x)

gives me 给我

In [302]: df
Out[302]: 
   a  b   c1    c2    c3    c4
0  1  j  2.0   5.0   NaN   NaN
1  2  j  3.0   NaN   NaN   NaN
2  3  j  NaN   NaN   NaN   NaN
3  4  k  5.0  11.0  18.0  26.0
4  5  k  6.0  13.0  21.0   NaN
5  6  k  7.0  15.0   NaN   NaN
6  7  k  8.0   NaN   NaN   NaN
7  8  k  NaN   NaN   NaN   NaN

If you really want to have multiple keys, you can use a list of keys, but we have to rearrange the call a little: 如果您确实想拥有多个按键,则可以使用按键列表,但是我们必须稍微重新安排一下通话:

keys = ["b","b2"]
for x in range(1, 5):
    c = pd.Series(df.groupby(keys)["a"].rolling(x).sum().values, index=df.index)
    df[f"c{x}"]= c.groupby([df[k] for k in keys]).shift(-x)

or 要么

keys = ["b","b2"]
for x in range(1, 5):
    c = pd.Series(df.groupby(keys)["a"].rolling(x).sum().values, index=df.index)
    df[f"c{x}"]= df.assign(tmp=c).groupby(keys)["tmp"].shift(-x)

give me 给我吗

In [409]: df
Out[409]: 
   a  b b2   c1    c2  c3  c4
0  1  j  j  2.0   5.0 NaN NaN
1  2  j  j  3.0   NaN NaN NaN
2  3  j  j  NaN   NaN NaN NaN
3  4  k  k  5.0   NaN NaN NaN
4  5  k  k  NaN   NaN NaN NaN
5  6  k  l  7.0  15.0 NaN NaN
6  7  k  l  8.0   NaN NaN NaN
7  8  k  l  NaN   NaN NaN NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM