I have df which currently looks something like this:
Car Name Number
Adam Leaf 9
Adamm Leaf 9
Adam Lea NaN
Adam-Leaf NaN
Adam/Leaf 9
Claire-Green NaN
Cliare Green 3
Claire Green 3
Claire Gren NaN
Claire/Green 3
I am trying to remove the variations to achieve something like this
Car Name Number
Adam Leaf 9
Claire Green 3
here is one way from jellyfish
import jellyfish
s=df.groupby(df['Car Name'].apply(jellyfish.soundex)).first()
Car Name Number
Car Name
A354 Adam Leaf 9.0
C462 Claire-Green 3.0
This can be solved via calculating the Levenshtein distance or even better using the FuzzyWuzzy library
https://www.datacamp.com/community/tutorials/fuzzy-string-python
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.