Pandas: drop_duplicates with condition

Question

Is there any way to use drop_duplicates together with conditions? For example, let's take the following Dataframe:

import pandas as pd
df = pd.DataFrame({
'Customer_Name': ['Carl', 'Carl', 'Mark', 'Joe', 'Joe'],
'Customer_Id': [1000,None,None,None,50000]
})

Based on this Dataframe I would like to derive a Dataframe with distinct rows for Customer_Id and Customer_Name

    Customer_Id Customer_Name
0    1000        Carl
2    NaN         Mark
5    50000       Joe

Unfortunately, I cannot use the drop_duplicates method for this as this method would always delete the first or last duplicated occurrences.

However, in my case this differs (see Carl and Joe). Moreover, I cannot just delete all rows with None entries in the Customer_Id column as this would also delete the entry for Mark.

I deeply appreciate any help.

Answer 1

this one is working on your example:

>>> df.groupby('Customer_Name').first().reset_index()
  Customer_Name  Customer_Id
0          Carl         1000
1           Joe        50000
2          Mark          NaN

But I have to check how first() treating missing values to be sure it's working consistently.

Answer 2

This will give you only the highest numbered Customer_Id or NaN if unavailable

df.groupby('Customer_Name').Customer_Id.max().reset_index()

  Customer_Name  Customer_Id
0          Carl         1000
1           Joe        50000
2          Mark          NaN

Pandas: drop_duplicates with condition

Question

2 answers

solution1
1 2013-11-19 09:28:16

solution2
0 2013-11-19 09:33:41

Pandas: drop_duplicates with condition

Question

2 answers

solution1 1 2013-11-19 09:28:16

solution2 0 2013-11-19 09:33:41

solution1
1 2013-11-19 09:28:16

solution2
0 2013-11-19 09:33:41