How to compare and drop rows within groupby in pandas?

Question

I have a df that looks like this:

              datetime                     policyid                   score
0   1970-01-01 00:00:01.593560812         9876policyID1234567890        0 
1   1970-01-01 00:00:01.593560814         9876policyID1234567890        0 
2   1970-01-01 00:00:01.593560958         9876policyID1234567890        1
3   1970-01-01 00:00:01.593560964         9876policyID1234567890        1

I want to group by policyid and score BUT only keep the row with the greatest stamp per the same policyid and score.

I am doing the groupby like so:

df.groupby(['policyid','score'])

At this point, I am not sure how to compare the timestamp between rows and keep the row with the greater time stamp.

New DF should look like this:

              datetime                     policyid                   score
1   1970-01-01 00:00:01.593560814         9876policyID1234567890        0 
3   1970-01-01 00:00:01.593560964         9876policyID1234567890        1

Thank you in advance.

Answer 1

You can use sort_values , then drop_duplicates :

df=df.sort_values('datetime').drop_duplicates(['policyid','score'], keep='last')

How to compare and drop rows within groupby in pandas?

Question

1 answers

solution1
1 ACCPTED 2020-07-01 00:12:10

How to compare and drop rows within groupby in pandas?

Question

1 answers

solution1 1 ACCPTED 2020-07-01 00:12:10

solution1
1 ACCPTED 2020-07-01 00:12:10