[英]Pandas groupby cumulative/rolling sum,average, and std
I have a dataframe ( df
) that is like the one below: 我有一个数据框( df
)类似于以下内容:
month-year name a b c
2018-01 X 2 1 4
2018-01 Y 1 0 5
2018-01 X 1 6 3
2018-01 Y 4 10 7
2018-02 X 13 4 2
2018-02 Y 22 13 9
2018-02 X 3 7 4
2018-02 Y 2 15 0
I want to groupby
month-year
and name
to get the sum of column a
, average of column b
, and std of column c
. 我想month-year
和name
groupby
以得到a
列, b
列a
平均值和c
列的std之和。 However, I want the sum, average, and std to be a rolling/cumulative number. 但是,我希望求和,平均值和std为滚动/累积数。
For example, for this dataset, to find the output I want for a, I can do something like 例如,对于此数据集,要查找我想要的输出,我可以做类似的事情
df.groupby(['month_year','name']).agg(sum).groupby(level=[1]).agg({'a':np.cumsum})
to get something like 得到类似的东西
month-year name a
2018-01 X 3
Y 5
2018-02 X 19
Y 29
What can I do to find the cumulative average of b
and std of c
to get an output that looks like this? 我该怎么做才能找到c
的b
和std的累积平均值,以得到如下所示的输出?
month-year name a b c
2018-01 X 3 3.5 0.71
Y 5 5 1.41
2018-02 X 19 4.5 0.96
Y 29 9.5 3.86
Thank you. 谢谢。
You can do this with expanding
您可以通过expanding
来做到这一点
The first step is to calculate the expanding sum, mean and std for each of your columns, grouping only by 'name'
and to join that back to the original DataFrame
. 第一步是计算每个列的扩展总和,均值和标准差,仅按'name'
分组并将其连接回原始DataFrame
。
Then you want to groupby and select the last row within each ['month-year', 'name']
group. 然后,您要分组,并选择每个['month-year', 'name']
组中的最后一行。
df = df.join(df.groupby(['name']).expanding().agg({'a': sum, 'b': 'mean', 'c': 'std'})
.reset_index(level=0, drop=True)
.add_suffix('_roll'))
df.groupby(['month-year', 'name']).last().drop(columns=['a', 'b', 'c'])
Output: 输出:
a_roll b_roll c_roll
month-year name
2018-01 X 3.0 3.5 0.707107
Y 5.0 5.0 1.414214
2018-02 X 19.0 4.5 0.957427
Y 29.0 9.5 3.862210
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.