Is there any way to use drop_duplicates
together with conditions? For example, let's take the following Dataframe:
import pandas as pd
df = pd.DataFrame({
'Customer_Name': ['Carl', 'Carl', 'Mark', 'Joe', 'Joe'],
'Customer_Id': [1000,None,None,None,50000]
})
Based on this Dataframe I would like to derive a Dataframe with distinct rows for Customer_Id
and Customer_Name
Customer_Id Customer_Name
0 1000 Carl
2 NaN Mark
5 50000 Joe
Unfortunately, I cannot use the drop_duplicates
method for this as this method would always delete the first or last duplicated occurrences.
However, in my case this differs (see Carl and Joe). Moreover, I cannot just delete all rows with None entries in the Customer_Id column as this would also delete the entry for Mark.
I deeply appreciate any help.
this one is working on your example:
>>> df.groupby('Customer_Name').first().reset_index()
Customer_Name Customer_Id
0 Carl 1000
1 Joe 50000
2 Mark NaN
But I have to check how first()
treating missing values to be sure it's working consistently.
This will give you only the highest numbered Customer_Id
or NaN
if unavailable
df.groupby('Customer_Name').Customer_Id.max().reset_index()
Customer_Name Customer_Id
0 Carl 1000
1 Joe 50000
2 Mark NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.