简体   繁体   English

获取pandas中groupby的最大值和最小值之间的差异并计算平均值

[英]get the difference between max and min for a groupby in pandas and calculate the average

I have one dataframe like below:我有一个如下所示的数据框:

ticker  fy  fp  value   f_date  rn
MSFT    2009    0   144 2010-01-01T12:12:34 0
AAPL    2010    0   144 2010-01-01T12:12:34 0
MSFT    2009    0   48  2014-05-01T12:12:34 1
AAPL    2011    0   80  2012-01-01T12:12:34 1
GOOG    2010    0   40  2010-01-01T12:12:34 0

I just want to groupby this data on the basis ticker , fy , fp just like below我只想根据tickerfyfp对这些数据进行fy ,如下所示

df.groupby(by=['ticker', 'fy', 'fp'])

On the basis of this, i just want to calculate the difference of max and min of f_date and divide it by max of rn .在此基础上,我只想计算f_datemaxminf_date并将其除以max of rnmax of rn For example, group MSFT, 2009, 0 , max date is 2014-05-01T12:12:34 and min date is 2010-01-01T12:12:34 , and the max rn is 1, so i want to calculate it as max(f_date) - min(f_date)/ max(rn+1) .例如,组MSFT, 2009, 0 ,最大日期为2014-05-01T12:12:34 ,最小日期为2010-01-01T12:12:34 ,最大rn为 1,所以我想将其计算为max(f_date) - min(f_date)/ max(rn+1) so i'll get the days inbetween of these two dates, hence i can map this data with other to do some analysis所以我会得到这两个日期之间的天数,因此我可以将这些数据与其他数据进行映射以进行一些分析

i'm unable to move forward after groupby.在groupby之后我无法前进。

For pandas 0.25+ is possible use named aggregations , then subtract and divide columns:对于 0.25+ 的熊猫,可以使用命名聚合,然后减去和划分列:

df['f_date'] = pd.to_datetime(df['f_date'])
df = df.groupby(by=['ticker', 'fy', 'fp']).agg(min1=('f_date','min'),
                                               max1=('f_date','max'),
                                               rn=('rn', 'max'))

df['new'] = df['max1'].sub(df['min1']).div(df['rn'].add(1))
print (df)
                              min1                max1  rn               new
ticker fy   fp                                                              
AAPL   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0   0 days 00:00:00
       2011 0  2012-01-01 12:12:34 2012-01-01 12:12:34   1   0 days 00:00:00
GOOG   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0   0 days 00:00:00
MSFT   2009 0  2010-01-01 12:12:34 2014-05-01 12:12:34   1 790 days 12:00:00

Or if necessary convert difference of datetimes (timedeltas) to seconds by Series.dt.total_seconds :或者,如有必要,通过Series.dt.total_seconds将日期时间(timedeltas)的差异转换为秒:

df['new1'] = df['max1'].sub(df['min1']).dt.total_seconds().div(df['rn'].add(1))
print (df)
                              min1                max1  rn         new
ticker fy   fp                                                        
AAPL   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0         0.0
       2011 0  2012-01-01 12:12:34 2012-01-01 12:12:34   1         0.0
GOOG   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0         0.0
MSFT   2009 0  2010-01-01 12:12:34 2014-05-01 12:12:34   1  68299200.0

Solution for oldier pandas versions:较旧的熊猫版本的解决方案:

df['f_date'] = pd.to_datetime(df['f_date'])
df = df.groupby(by=['ticker', 'fy', 'fp']).agg({'f_date':['min','max'],
                                               'rn':'max'})
df.columns = df.columns.map('_'.join)
df['new'] = df['f_date_max'].sub(df['f_date_min']).div(df['rn_max'].add(1))
print (df)
                        f_date_min          f_date_max  rn_max  \
ticker fy   fp                                                   
AAPL   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34       0   
       2011 0  2012-01-01 12:12:34 2012-01-01 12:12:34       1   
GOOG   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34       0   
MSFT   2009 0  2010-01-01 12:12:34 2014-05-01 12:12:34       1   

                             new  
ticker fy   fp                    
AAPL   2010 0    0 days 00:00:00  
       2011 0    0 days 00:00:00  
GOOG   2010 0    0 days 00:00:00  
MSFT   2009 0  790 days 12:00:00  

Last if necessary convert MultiIndex to columns:最后,如有必要,将MultiIndex转换为列:

df = df.reset_index()
print (df)
  ticker    fy  fp          f_date_min          f_date_max  rn_max  \
0   AAPL  2010   0 2010-01-01 12:12:34 2010-01-01 12:12:34       0   
1   AAPL  2011   0 2012-01-01 12:12:34 2012-01-01 12:12:34       1   
2   GOOG  2010   0 2010-01-01 12:12:34 2010-01-01 12:12:34       0   
3   MSFT  2009   0 2010-01-01 12:12:34 2014-05-01 12:12:34       1   

                new  
0   0 days 00:00:00  
1   0 days 00:00:00  
2   0 days 00:00:00  
3 790 days 12:00:00  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM