I want to remove duplicates in a column via Pandas. I tried df.drop_duplicates()
but no luck. How to achieve this in Pandas?
Input :
A
team=red, Manager=Travis
team=Blue, Manager=John, team=Blue
Manager=David, Bank=HDFC, team=XYZ, Bank=HDFC
Expected_Output :
A
team=red, Manager=Travis
team=Blue, Manager=John
Manager=David, Bank=HDFC, team=XYZ
Code
df = df.drop_duplicates('A', keep='last')
You can use some data structures to achieve this result.
print(df['A'])
0 team=red, Manager=Travis
1 team=Blue, Manager=John, team=Blue
2 Manager=David, Bank=HDFC, team=XYZ, Bank=HDFC
Name: A, dtype: object
out = (
df['A'].str.split(r',\s+')
.map(set)
.str.join(", ")
)
print(out)
0 Manager=Travis, team=red
1 team=Blue, Manager=John
2 Bank=HDFC, team=XYZ, Manager=David
Name: A, dtype: object
Alternatively, if the order of your string entries is important, you can use dict.fromkeys
instead of a set
. Since dictionaries are implicitly ordered as of Py > 3.6
out = (
df['A'].str.split(r',\s+')
.map(dict.fromkeys)
.str.join(", ")
)
print(out)
0 team=red, Manager=Travis
1 team=Blue, Manager=John
2 Manager=David, Bank=HDFC, team=XYZ
Name: A, dtype: object
Try:
df['A'].str.split(',').explode().str.strip(' ')\
.drop_duplicates().groupby(level=0).agg(','.join)
Output:
0 team=red,Manager=Travis
1 team=Blue,Manager=John
2 Manager=David,Bank=HDFC,team=XYZ
Name: A, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.