remove duplicates values in a column seperated by comma in Pandas

Question

I want to remove duplicates in a column via Pandas. I tried df.drop_duplicates() but no luck. How to achieve this in Pandas?

Input :

A
team=red, Manager=Travis
team=Blue, Manager=John, team=Blue
Manager=David, Bank=HDFC, team=XYZ, Bank=HDFC

Expected_Output :

A
team=red, Manager=Travis
team=Blue, Manager=John
Manager=David, Bank=HDFC, team=XYZ

Code

df = df.drop_duplicates('A', keep='last')

Answer 1

You can use some data structures to achieve this result.

split entries
convert to set (or some non duplicated structure)
join back to string

print(df['A'])
0                         team=red, Manager=Travis
1               team=Blue, Manager=John, team=Blue
2    Manager=David, Bank=HDFC, team=XYZ, Bank=HDFC
Name: A, dtype: object


out = (
    df['A'].str.split(r',\s+')
    .map(set)
    .str.join(", ")
)

print(out)
0              Manager=Travis, team=red
1               team=Blue, Manager=John
2    Bank=HDFC, team=XYZ, Manager=David
Name: A, dtype: object

Alternatively, if the order of your string entries is important, you can use dict.fromkeys instead of a set . Since dictionaries are implicitly ordered as of Py > 3.6

out = (
    df['A'].str.split(r',\s+')
    .map(dict.fromkeys)
    .str.join(", ")
)

print(out)
0              team=red, Manager=Travis
1               team=Blue, Manager=John
2    Manager=David, Bank=HDFC, team=XYZ
Name: A, dtype: object

Answer 2

Try:

df['A'].str.split(',').explode().str.strip(' ')\
       .drop_duplicates().groupby(level=0).agg(','.join)

Output:

0             team=red,Manager=Travis
1              team=Blue,Manager=John
2    Manager=David,Bank=HDFC,team=XYZ
Name: A, dtype: object

remove duplicates values in a column seperated by comma in Pandas

Question

2 answers

solution1
2 2022-03-22 14:49:19

solution2
1 2022-03-22 14:46:36

remove duplicates values in a column seperated by comma in Pandas

Question

2 answers

solution1 2 2022-03-22 14:49:19

solution2 1 2022-03-22 14:46:36

solution1
2 2022-03-22 14:49:19

solution2
1 2022-03-22 14:46:36