I have a data frame that looks like this.
ATM ID Ref no Timestamp
1 11 2020/02/01 15:10:23
1 11 2020/02/01 15:11:03
1 111 2020/02/06 17:45:41
1 111 2020/02/06 18:11:03
2 22 2020/02/07 15:11:03
2 22 2020/02/07 15:25:01
2 22 2020/02/07 15:38:51
2 222 2020/02/07 15:11:03
and I would like to have it grouped by ATM ID and Ref no to return only 1 result of refno and ATM ID combination with the duration between the timestamp of the 1st and last ref no.
output format
ATM ID Ref no Timestamp Diff
1 11 2020/02/01 15:11:03 00:00:40
1 111 2020/02/06 18:11:03 00:25:22
2 22 2020/02/07 15:38:51 00:27:48
2 222 2020/02/07 15:11:03 00:00:00
Use custom lambda function in GroupBy.agg
for difference last with first values:
df1 = (df.groupby(['ATM ID','Ref no'])['Timestamp']
.agg(lambda x: x.iat[-1] - x.iat[0])
.reset_index(name='diff'))
print (df1)
ATM ID Ref no diff
0 1 11 00:00:40
1 1 111 00:25:22
2 2 22 00:27:48
3 2 222 00:00:00
Or aggregate last
and first
and create new column by DataFrame.assign
:
df1 = (df.groupby(['ATM ID','Ref no'])['Timestamp']
.agg(['last','first'])
.assign(diff = lambda x: x.pop('last') - x.pop('first'))
.reset_index()
)
print (df1)
ATM ID Ref no diff
0 1 11 00:00:40
1 1 111 00:25:22
2 2 22 00:27:48
3 2 222 00:00:00
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.