简体   繁体   中英

how to groupby and aggregate in pandas

I have following pandas dataframe

  index    key                                   start   end     nozzle  tank
  0        2018-01-01 02:00:01 - 02:30:00_1_1    2000    2003    1       1 
  1        2018-01-01 02:00:01 - 02:30:00_1_1    2003    2006    1       1 
  2        2018-01-01 02:00:01 - 02:30:00_1_1    2006    2008    1       1
  3        2018-01-01 02:00:01 - 02:30:00_1_1    2008    2010    1       1
  4        2018-01-01 02:00:01 - 02:30:00_1_1    2010    2012    1       1 
  5        2018-01-01 02:00:01 - 02:30:00_1_2    2002    2009    2       1 
  6        2018-01-01 02:00:01 - 02:30:00_1_2    2009    2011    2       1
  7        2018-01-01 02:00:01 - 02:30:00_1_2    2011    2013    2       1
  8        2018-01-01 02:00:01 - 02:30:00_1_2    2013    2015    2       1
  9        2018-01-01 03:30:01 - 04:00:00_1_3    2020    2022    3       1

Now I want to take first and last observation of every key and find the difference,where there is only one observation of key,it should calculate the difference between end - start of same observation.

calculation is for nozzle 1 = 2012-2000 = 12 nozzle 2 = 2015-2002 = 13

My desired dataframe would be

  index   key                                   nozzle_1  nozzle_2  nozzle_3
  0       2018-01-01 02:00:01 - 02:30:00_1_1    12        0         0 
  1       2018-01-01 02:00:01 - 02:30:00_1_2    0         13        0 
  2       2018-01-01 03:30:01 - 04:00:00_1_3    0         0         2

Use:

df1 = (df.groupby(['key','nozzle'])
        .agg({'start':'first','end':'last'})
        .assign(dif = lambda x: x['end'] - x['start'])['dif']
        .unstack(fill_value=0)
        .add_prefix('nozzle_')
        .reset_index()
        .rename_axis(None, axis=1))
print (df1)
                                  key  nozzle_1  nozzle_2  nozzle_3
0  2018-01-01 02:00:01 - 02:30:00_1_1        12         0         0
1  2018-01-01 02:00:01 - 02:30:00_1_2         0        13         0
2  2018-01-01 03:30:01 - 04:00:00_1_3         0         0         2

Explanation :

  1. First aggregate by agg with first and last
  2. Create new column by assign with subtraction
  3. Reshape by unstack
  4. Change columns names by add_prefix
  5. Last data cleaning by reset_index and rename_axis

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM