python drop duplicates by certain order (not `first`, `last`)

Question

ID  values
111 reason1
111 reason2
111 reason3
222 reason2
222 reason4
222 reason5

df.drop_duplicates(["ID"], keep='???', inplace=True)

The way I know is using the drop_duplicates, but it only gives me the option first , last . I want to check that if there is reason2, then keep the record with reason2, else check reason3, etc. Basically, there is particular order, such as reason2, reason3, reason4, etc.

Answer 1

Based on the comments, this can be one of the implementations: (Implementing @brittenb 's idea.)

priority_dict = {
    'reason1':1,
    'reason2':2,
    'reason3':3,
    'reason4':4,
    'reason5':5
}
df['priority'] = df['values'].map(priority_dict)
df = df.sort_values(by=['ID', 'priority'])
df.drop_duplicates(['ID'], keep='first')

Output:

     ID values  priority
0   111 reason1 1
3   222 reason2 2

Answer 2

Use 'category' dtype with defined order and sort:

df['values'] = df['values'].astype('category', ordered=True)\
                           .cat.reorder_categories(['reason2',
                                                    'reason3',
                                                    'reason1',
                                                    'reason4',
                                                    'reason5'])

df.sort_values('values').drop_duplicates('ID', keep='first')

Output:

    ID   values
1  111  reason2
3  222  reason2

python drop duplicates by certain order (not `first`, `last`)

Question

2 answers

solution1
4 ACCPTED 2018-06-05 21:02:05

solution2
0 2018-06-05 21:08:08

python drop duplicates by certain order (not `first`, `last`)

Question

2 answers

solution1 4 ACCPTED 2018-06-05 21:02:05

solution2 0 2018-06-05 21:08:08

solution1
4 ACCPTED 2018-06-05 21:02:05

solution2
0 2018-06-05 21:08:08