简体   繁体   English

熊猫数据框分组年份滚动统计

[英]Pandas dataframe groupby multiple years rolling stat

I have a pandas dataframe for which I'm trying to compute an expanding windowed aggregation after grouping by columns. 我有一个pandas数据框,正在尝试按列分组后计算扩展的窗口聚合。 The data structure is something like this: 数据结构是这样的:

df = pd.DataFrame([['A',1,2015,4],['A',1,2016,5],['A',1,2017,6],['B',1,2015,10],['B',1,2016,11],['B',1,2017,12],
               ['A',1,2015,24],['A',1,2016,25],['A',1,2017,26],['B',1,2015,30],['B',1,2016,31],['B',1,2017,32],
              ['A',2,2015,4],['A',2,2016,5],['A',2,2017,6],['B',2,2015,10],['B',2,2016,11],['B',2,2017,12]],columns=['Typ','ID','Year','dat'])\
.sort_values(by=['Typ','ID','Year'])

ie

    Typ ID  Year    dat
0   A   1   2015    4
6   A   1   2015    24
1   A   1   2016    5
7   A   1   2016    25
2   A   1   2017    6
8   A   1   2017    26
12  A   2   2015    4
13  A   2   2016    5
14  A   2   2017    6
3   B   1   2015    10
9   B   1   2015    30
4   B   1   2016    11
10  B   1   2016    31
5   B   1   2017    12
11  B   1   2017    32
15  B   2   2015    10
16  B   2   2016    11
17  B   2   2017    12

In general, there is a completely varying number of years per Type-ID and rows per Type-ID-Year . 通常,每个Type-ID的年数和每个Type-ID-Year行数完全不同。 I need to group this dataframe by the columns Type and ID , then compute an expanding windowed median & std of all observations by Year . 我需要按TypeID列将此数据框分组,然后按Year计算所有观察值的扩展窗口中位数和标准差。 I would like to get output results like this: 我想要这样的输出结果:

    Typ ID  Year    median  std
0   A   1   2015    14.0    14.14
1   A   1   2016    14.5    11.56
2   A   1   2017    15.0    10.99
3   A   2   2015    4.0     0
4   A   2   2016    4.5     0
5   A   2   2017    5.0     0
6   B   1   2015    20.0    14.14
7   B   1   2016    20.5    11.56
8   B   1   2017    21.0    10.99
9   B   2   2015    10.0    0
10  B   2   2016    10.5    0
11  B   2   2017    11.0    0

Hence, I want something like a groupby by ['Type','ID','Year'] , with the median & std for each Type-ID-Year computed for all data with the same Type-ID and cumulative inclusive that Year . 因此,我想要类似['Type','ID','Year']groupby ,其中每个Type-ID-Year的中位数&std都是针对具有相同Type-ID所有数据计算的,并且该Year累计。

How can I do this without manual iteration? 没有人工迭代该怎么办?

There's been no activity on this question, so I'll post the solution I found. 该问题没有任何活动,因此我将发布找到的解决方案。

mn = df.groupby(by=['Typ','ID']).dat.expanding().median().reset_index().set_index('level_2')
mylast = lambda x: x.iloc[-1]
mn = mn.join(df['Year'])
mn = mn.groupby(by=['Typ','ID','Year']).agg(mylast).reset_index()

My solution follows this algorithm: 我的解决方案遵循以下算法:

  1. group the data, compute the windowed median, and get the original index back 对数据进行分组,计算加窗的中位数,然后取回原始索引
  2. with the original index back, get the year back from the original dataframe 返回原始索引,从原始数据框中获取年份
  3. group by the grouping columns, taking the last (in order) value for each 按分组列分组,取每个的最后一个(顺序)值

This gives the output desired. 这将提供所需的输出。 The same process can be followed for the standard deviation (or any other statistic desired). 对于标准偏差(或所需的任何其他统计量),可以遵循相同的过程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM