简体   繁体   中英

Group By Customer Id and Also Take Date Column With Most Recent Value In Pandas

I am new to working with Python and Pandas and I have a question about grouping a dateframe I have.

I am grouping the dataframe by id but if there are two rows for one id, I only want to take the row that has the most recent value in the category_timestamp column.

This is what the results look like in the dataframe:

id          date_cancelled       owner_id   reason                  category_timestamp
610040      2020-06-23 15:26:32  345198     No Longer Qualifies     2020-06-23 15:26:15       
122672      2020-06-23 15:30:35  28950      Billing Cancellation    2020-06-23 15:30:35
122672      2020-06-23 15:30:35  28950      No Contact              2018-04-26 8:45:17
862708      2020-06-23 17:31:03  327378     Changed Mind/Persuaded  2020-06-23 17:30:50
436932      2020-06-25 1:07:02   28950      No Contact              2019-08-09 8:02:05

So what I would like to have happen is the id that is showing twice(122672), I only want to display the one with the most recent category_timestamp.

How do I add this to this line of code?

merged_df.groupby(['contact_id']) 

Thanks!

I think it would be easier to just sort them by date and then drop the duplicates.

df = df.sort_values('date_cancelled', ascending=False)
df = df.drop_duplicates(subset='owner_id', keep='first')
print(df) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM