简体   繁体   中英

Aggregate to 15min based timestamp to hour and find sum, avg and max for multiple columns in pandas

I have a dataframe with period_start_time by every 15 minutes and now I need to aggregate to 1 hour and calculate sum and avg for almost every column in dataframe (it has about 20 columns) and

    PERIOD_START_TIME       ID      val1     val2
    06.21.2017 22:15:00      12       3        0
    06.21.2017 22:30:00      12       5        6
    06.21.2017 22:45:00      12       0        3
    06.21.2017 23:00:00      12       5        2
    ...
    06.21.2017 22:15:00      15       9        2
    06.21.2017 22:30:00      15       0        2
    06.21.2017 22:45:00      15       1        5
    06.21.2017 23:00:00      15       0        1
    ...

    Desired output: 
   PERIOD_START_TIME        ID    val1(avg) val1(sum) val1(max) ...
   06.21.2017 22:00:00      12      3.25       13         5     
     ...
   06.21.2017 23:00:00      15      2.25       10         9    ...

And for columns val2 too, and for every other column in dataframe. I have no idea how to group by period start time for every hour, not for the whole day, no idea how to start.

I believe you need Series.dt.floor for Hour s and then aggregate by agg :

df = df.groupby([df['PERIOD_START_TIME'].dt.floor('H'),'ID']).agg(['mean','sum', 'max'])
#for columns from MultiIndex
df.columns = df.columns.map('_'.join)
print (df)
                        val1_mean  val1_sum  val1_max  val2_mean  val2_sum  \
PERIOD_START_TIME   ID                                                       
2017-06-21 22:00:00 12   2.666667         8         5          3         9   
                    15   3.333333        10         9          3         9   
2017-06-21 23:00:00 12   5.000000         5         5          2         2   
                    15   0.000000         0         0          1         1   

                        val2_max  
PERIOD_START_TIME   ID            
2017-06-21 22:00:00 12         6  
                    15         5  
2017-06-21 23:00:00 12         2  
                    15         1  

df = df.reset_index()
print (df)
  PERIOD_START_TIME  ID  val1_mean  val1_sum  val1_max  val2_mean  val2_sum  \
0  2017-06-21 22:00  12   2.666667         8         5          3         9   
1  2017-06-21 22:00  15   3.333333        10         9          3         9   
2  2017-06-21 23:00  12   5.000000         5         5          2         2   
3  2017-06-21 23:00  15   0.000000         0         0          1         1   

   val2_max  
0         6  
1         5  
2         2  
3         1  

Very similarly you can convert PERIOD_START_TIME to a pandas Period.

df['PERIOD_START_TIME'] = df['PERIOD_START_TIME'].dt.to_period('H')
df.groupby(['PERIOD_START_TIME', 'ID']).agg(['max', 'min', 'mean']).reset_index()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM