简体   繁体   中英

How to calculate the time diff between the 1st and last record of a python pandas dataframe

I have a data frame that looks like this.

ATM ID  Ref no  Timestamp

1       11      2020/02/01 15:10:23
1       11      2020/02/01 15:11:03
1       111     2020/02/06 17:45:41
1       111     2020/02/06 18:11:03
2       22      2020/02/07 15:11:03
2       22      2020/02/07 15:25:01
2       22      2020/02/07 15:38:51
2       222     2020/02/07 15:11:03

and I would like to have it grouped by ATM ID and Ref no to return only 1 result of refno and ATM ID combination with the duration between the timestamp of the 1st and last ref no.

output format

ATM ID  Ref no  Timestamp            Diff
1       11      2020/02/01 15:11:03  00:00:40
1       111     2020/02/06 18:11:03  00:25:22
2       22      2020/02/07 15:38:51  00:27:48
2       222     2020/02/07 15:11:03  00:00:00

Use custom lambda function in GroupBy.agg for difference last with first values:

df1 = (df.groupby(['ATM ID','Ref no'])['Timestamp']
         .agg(lambda x: x.iat[-1] - x.iat[0])
         .reset_index(name='diff'))
print (df1)
   ATM ID  Ref no     diff
0       1      11 00:00:40
1       1     111 00:25:22
2       2      22 00:27:48
3       2     222 00:00:00

Or aggregate last and first and create new column by DataFrame.assign :

df1 = (df.groupby(['ATM ID','Ref no'])['Timestamp']
         .agg(['last','first'])
         .assign(diff = lambda x: x.pop('last') - x.pop('first'))
         .reset_index()
         )
print (df1)
   ATM ID  Ref no     diff
0       1      11 00:00:40
1       1     111 00:25:22
2       2      22 00:27:48
3       2     222 00:00:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM