I have following pandas dataframe
index key start end nozzle tank
0 2018-01-01 02:00:01 - 02:30:00_1_1 2000 2003 1 1
1 2018-01-01 02:00:01 - 02:30:00_1_1 2003 2006 1 1
2 2018-01-01 02:00:01 - 02:30:00_1_1 2006 2008 1 1
3 2018-01-01 02:00:01 - 02:30:00_1_1 2008 2010 1 1
4 2018-01-01 02:00:01 - 02:30:00_1_1 2010 2012 1 1
5 2018-01-01 02:00:01 - 02:30:00_1_2 2002 2009 2 1
6 2018-01-01 02:00:01 - 02:30:00_1_2 2009 2011 2 1
7 2018-01-01 02:00:01 - 02:30:00_1_2 2011 2013 2 1
8 2018-01-01 02:00:01 - 02:30:00_1_2 2013 2015 2 1
9 2018-01-01 03:30:01 - 04:00:00_1_3 2020 2022 3 1
Now I want to take first and last observation of every key and find the difference,where there is only one observation of key,it should calculate the difference between end - start
of same observation.
calculation is for nozzle 1 = 2012-2000 = 12 nozzle 2 = 2015-2002 = 13
My desired dataframe would be
index key nozzle_1 nozzle_2 nozzle_3
0 2018-01-01 02:00:01 - 02:30:00_1_1 12 0 0
1 2018-01-01 02:00:01 - 02:30:00_1_2 0 13 0
2 2018-01-01 03:30:01 - 04:00:00_1_3 0 0 2
Use:
df1 = (df.groupby(['key','nozzle'])
.agg({'start':'first','end':'last'})
.assign(dif = lambda x: x['end'] - x['start'])['dif']
.unstack(fill_value=0)
.add_prefix('nozzle_')
.reset_index()
.rename_axis(None, axis=1))
print (df1)
key nozzle_1 nozzle_2 nozzle_3
0 2018-01-01 02:00:01 - 02:30:00_1_1 12 0 0
1 2018-01-01 02:00:01 - 02:30:00_1_2 0 13 0
2 2018-01-01 03:30:01 - 04:00:00_1_3 0 0 2
Explanation :
agg
with first
and last
assign
with subtractionunstack
add_prefix
reset_index
and rename_axis
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.