获取pandas中groupby的最大值和最小值之间的差异并计算平均值

Question

I have one dataframe like below:我有一个如下所示的数据框：

ticker  fy  fp  value   f_date  rn
MSFT    2009    0   144 2010-01-01T12:12:34 0
AAPL    2010    0   144 2010-01-01T12:12:34 0
MSFT    2009    0   48  2014-05-01T12:12:34 1
AAPL    2011    0   80  2012-01-01T12:12:34 1
GOOG    2010    0   40  2010-01-01T12:12:34 0

I just want to groupby this data on the basis ticker , fy , fp just like below我只想根据ticker 、 fy 、 fp对这些数据进行fy ，如下所示

df.groupby(by=['ticker', 'fy', 'fp'])

On the basis of this, i just want to calculate the difference of max and min of f_date and divide it by max of rn .在此基础上，我只想计算f_date的max和min的f_date并将其除以max of rn的max of rn 。 For example, group MSFT, 2009, 0 , max date is 2014-05-01T12:12:34 and min date is 2010-01-01T12:12:34 , and the max rn is 1, so i want to calculate it as max(f_date) - min(f_date)/ max(rn+1) .例如，组MSFT, 2009, 0 ，最大日期为2014-05-01T12:12:34 ，最小日期为2010-01-01T12:12:34 ，最大rn为 1，所以我想将其计算为max(f_date) - min(f_date)/ max(rn+1) 。 so i'll get the days inbetween of these two dates, hence i can map this data with other to do some analysis所以我会得到这两个日期之间的天数，因此我可以将这些数据与其他数据进行映射以进行一些分析

i'm unable to move forward after groupby.在groupby之后我无法前进。

Answer 1

For pandas 0.25+ is possible use named aggregations , then subtract and divide columns:对于 0.25+ 的熊猫，可以使用命名聚合，然后减去和划分列：

df['f_date'] = pd.to_datetime(df['f_date'])
df = df.groupby(by=['ticker', 'fy', 'fp']).agg(min1=('f_date','min'),
                                               max1=('f_date','max'),
                                               rn=('rn', 'max'))

df['new'] = df['max1'].sub(df['min1']).div(df['rn'].add(1))
print (df)
                              min1                max1  rn               new
ticker fy   fp                                                              
AAPL   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0   0 days 00:00:00
       2011 0  2012-01-01 12:12:34 2012-01-01 12:12:34   1   0 days 00:00:00
GOOG   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0   0 days 00:00:00
MSFT   2009 0  2010-01-01 12:12:34 2014-05-01 12:12:34   1 790 days 12:00:00

Or if necessary convert difference of datetimes (timedeltas) to seconds by Series.dt.total_seconds :或者，如有必要，通过Series.dt.total_seconds将日期时间（timedeltas）的差异转换为秒：

df['new1'] = df['max1'].sub(df['min1']).dt.total_seconds().div(df['rn'].add(1))
print (df)
                              min1                max1  rn         new
ticker fy   fp                                                        
AAPL   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0         0.0
       2011 0  2012-01-01 12:12:34 2012-01-01 12:12:34   1         0.0
GOOG   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0         0.0
MSFT   2009 0  2010-01-01 12:12:34 2014-05-01 12:12:34   1  68299200.0

Solution for oldier pandas versions:较旧的熊猫版本的解决方案：

df['f_date'] = pd.to_datetime(df['f_date'])
df = df.groupby(by=['ticker', 'fy', 'fp']).agg({'f_date':['min','max'],
                                               'rn':'max'})
df.columns = df.columns.map('_'.join)
df['new'] = df['f_date_max'].sub(df['f_date_min']).div(df['rn_max'].add(1))
print (df)
                        f_date_min          f_date_max  rn_max  \
ticker fy   fp                                                   
AAPL   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34       0   
       2011 0  2012-01-01 12:12:34 2012-01-01 12:12:34       1   
GOOG   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34       0   
MSFT   2009 0  2010-01-01 12:12:34 2014-05-01 12:12:34       1   

                             new  
ticker fy   fp                    
AAPL   2010 0    0 days 00:00:00  
       2011 0    0 days 00:00:00  
GOOG   2010 0    0 days 00:00:00  
MSFT   2009 0  790 days 12:00:00

Last if necessary convert MultiIndex to columns:最后，如有必要，将MultiIndex转换为列：

df = df.reset_index()
print (df)
  ticker    fy  fp          f_date_min          f_date_max  rn_max  \
0   AAPL  2010   0 2010-01-01 12:12:34 2010-01-01 12:12:34       0   
1   AAPL  2011   0 2012-01-01 12:12:34 2012-01-01 12:12:34       1   
2   GOOG  2010   0 2010-01-01 12:12:34 2010-01-01 12:12:34       0   
3   MSFT  2009   0 2010-01-01 12:12:34 2014-05-01 12:12:34       1   

                new  
0   0 days 00:00:00  
1   0 days 00:00:00  
2   0 days 00:00:00  
3 790 days 12:00:00

获取pandas中groupby的最大值和最小值之间的差异并计算平均值

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-12-23 14:54:36

获取pandas中groupby的最大值和最小值之间的差异并计算平均值

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-12-23 14:54:36

解决方案1
3 已采纳 2019-12-23 14:54:36