简体   繁体   English

熊猫groupby累积/总和,平均值和标准差

[英]Pandas groupby cumulative/rolling sum,average, and std

I have a dataframe ( df ) that is like the one below: 我有一个数据框( df )类似于以下内容:

month-year    name    a    b    c
2018-01       X       2    1    4
2018-01       Y       1    0    5
2018-01       X       1    6    3
2018-01       Y       4    10   7
2018-02       X       13   4    2
2018-02       Y       22   13   9
2018-02       X       3    7    4
2018-02       Y       2    15   0

I want to groupby month-year and name to get the sum of column a , average of column b , and std of column c . 我想month-yearname groupby以得到a列, ba平均值和c列的std之和。 However, I want the sum, average, and std to be a rolling/cumulative number. 但是,我希望求和,平均值和std为滚动/累积数。

For example, for this dataset, to find the output I want for a, I can do something like 例如,对于此数据集,要查找我想要的输出,我可以做类似的事情

df.groupby(['month_year','name']).agg(sum).groupby(level=[1]).agg({'a':np.cumsum})

to get something like 得到类似的东西

month-year    name    a
2018-01       X       3
              Y       5
2018-02       X       19
              Y       29

What can I do to find the cumulative average of b and std of c to get an output that looks like this? 我该怎么做才能找到cb和std的累积平均值,以得到如下所示的输出?

month-year    name    a    b    c
2018-01       X       3    3.5  0.71
              Y       5    5    1.41
2018-02       X       19   4.5  0.96
              Y       29   9.5  3.86

Thank you. 谢谢。

You can do this with expanding 您可以通过expanding来做到这一点

The first step is to calculate the expanding sum, mean and std for each of your columns, grouping only by 'name' and to join that back to the original DataFrame . 第一步是计算每个列的扩展总和,均值和标准差,仅按'name'分组并将其连接回原始DataFrame

Then you want to groupby and select the last row within each ['month-year', 'name'] group. 然后,您要分组,并选择每个['month-year', 'name']组中的最后一行。

df = df.join(df.groupby(['name']).expanding().agg({'a': sum, 'b': 'mean', 'c': 'std'})
               .reset_index(level=0, drop=True)
               .add_suffix('_roll'))

df.groupby(['month-year', 'name']).last().drop(columns=['a', 'b', 'c'])

Output: 输出:

                 a_roll  b_roll    c_roll
month-year name                          
2018-01    X        3.0     3.5  0.707107
           Y        5.0     5.0  1.414214
2018-02    X       19.0     4.5  0.957427
           Y       29.0     9.5  3.862210

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM