Group By Customer Id and Also Take Date Column With Most Recent Value In Pandas

Question

I am new to working with Python and Pandas and I have a question about grouping a dateframe I have.

I am grouping the dataframe by id but if there are two rows for one id, I only want to take the row that has the most recent value in the category_timestamp column.

This is what the results look like in the dataframe:

id          date_cancelled       owner_id   reason                  category_timestamp
610040      2020-06-23 15:26:32  345198     No Longer Qualifies     2020-06-23 15:26:15       
122672      2020-06-23 15:30:35  28950      Billing Cancellation    2020-06-23 15:30:35
122672      2020-06-23 15:30:35  28950      No Contact              2018-04-26 8:45:17
862708      2020-06-23 17:31:03  327378     Changed Mind/Persuaded  2020-06-23 17:30:50
436932      2020-06-25 1:07:02   28950      No Contact              2019-08-09 8:02:05

So what I would like to have happen is the id that is showing twice(122672), I only want to display the one with the most recent category_timestamp.

How do I add this to this line of code?

merged_df.groupby(['contact_id'])

Thanks!

Answer 1

I think it would be easier to just sort them by date and then drop the duplicates.

df = df.sort_values('date_cancelled', ascending=False)
df = df.drop_duplicates(subset='owner_id', keep='first')
print(df)

Group By Customer Id and Also Take Date Column With Most Recent Value In Pandas

Question

1 answers

solution1
0 ACCPTED 2020-06-25 14:23:56

Group By Customer Id and Also Take Date Column With Most Recent Value In Pandas

Question

1 answers

solution1 0 ACCPTED 2020-06-25 14:23:56

solution1
0 ACCPTED 2020-06-25 14:23:56