[英]Pandas: Sum Previous N Rows by Group
I want to sum the prior N periods of data for each group. 我想对每个组的前N个数据周期求和。 I have seen how to do each individually (sum by group, or sum prior N periods ), but can't figure out a clean way to do both together.
我已经看到了如何单独执行每个操作(按组求和,或者将前N个周期求和 ),但是无法找出一种干净的方法来将两者一起执行。
I'm currently doing the following: 我目前正在执行以下操作:
import pandas as pd
sample_data = {'user': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b'],\
'clicks': [0,1,2,3,4,5,6,7,8,9]}
df = pd.DataFrame(sample_data)
df['clicks.1'] = df.groupby(['user'])['clicks'].shift(1)
df['clicks.2'] = df.groupby(['user'])['clicks'].shift(2)
df['clicks.3'] = df.groupby(['user'])['clicks'].shift(3)
df['total_clicks_prior3'] = df[['clicks.1','clicks.2', 'clicks.3']].sum(axis=1)
I don't want the 3 intermediate lagged columns, I just want the sum of those, so my desired output is: 我不想要3个中间滞后列,我只想要这些列的总和,所以我想要的输出是:
>>> df[['clicks','user','total_clicks_prior3']]
clicks user total_clicks_prior3
0 0 a NaN
1 1 a 0.0
2 2 a 1.0
3 3 a 3.0
4 4 a 6.0
5 5 b NaN
6 6 b 5.0
7 7 b 11.0
8 8 b 18.0
9 9 b 21.0
Note: I could obviously drop the 3 columns after creating them, but given that I will be creating multiple columns of different numbers of lagged periods, I feel like there has to be an easier way. 注意:创建3列后,显然可以删除它们,但是鉴于我将创建多个具有不同滞后时间的列,我觉得必须有一种更简单的方法。
This is groupby
+ rolling
+ shift
这是
groupby
+ rolling
+ shift
df.groupby('user')['clicks'].rolling(3, min_periods=1).sum().groupby(level=0).shift()
user
a 0 NaN
1 0.0
2 1.0
3 3.0
4 6.0
b 5 NaN
6 5.0
7 11.0
8 18.0
9 21.0
Name: clicks, dtype: float64
If you have a solution that works for each group, you can use apply
to use it on the groupby
object. 如果您有适用于每个组的解决方案,则可以使用
apply
在groupby
对象上使用它。 For instance, you linked to a question that has df['A'].rolling(min_periods=1, window=11).sum()
as an answer. 例如,您链接到一个以
df['A'].rolling(min_periods=1, window=11).sum()
作为答案的问题。 If that does what you want on the subgroups, you can do 如果这样做符合子组的要求,则可以
df.groupby('user').apply(lambda x: x['clicks'].rolling(min_periods=1, window=11).sum())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.