I have a df that looks like this:
datetime policyid score
0 1970-01-01 00:00:01.593560812 9876policyID1234567890 0
1 1970-01-01 00:00:01.593560814 9876policyID1234567890 0
2 1970-01-01 00:00:01.593560958 9876policyID1234567890 1
3 1970-01-01 00:00:01.593560964 9876policyID1234567890 1
I want to group by policyid
and score
BUT only keep the row with the greatest stamp per the same policyid and score.
I am doing the groupby like so:
df.groupby(['policyid','score'])
At this point, I am not sure how to compare the timestamp between rows and keep the row with the greater time stamp.
New DF should look like this:
datetime policyid score
1 1970-01-01 00:00:01.593560814 9876policyID1234567890 0
3 1970-01-01 00:00:01.593560964 9876policyID1234567890 1
Thank you in advance.
You can use sort_values
, then drop_duplicates
:
df=df.sort_values('datetime').drop_duplicates(['policyid','score'], keep='last')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.