简体繁体中英

Remove similar character string duplicates from a dataframe

原文 2019-10-17 14:13:58 4 2 python/ pandas/ dataframe/ data-cleaning

I have df which currently looks something like this:

Car Name      Number
Adam Leaf     9
Adamm Leaf    9
Adam Lea      NaN
Adam-Leaf     NaN
Adam/Leaf     9
Claire-Green  NaN
Cliare Green  3
Claire Green  3
Claire Gren   NaN
Claire/Green  3

I am trying to remove the variations to achieve something like this

Car Name      Number
Adam Leaf     9
Claire Green  3

2 answers

here is one way from jellyfish

import jellyfish

s=df.groupby(df['Car Name'].apply(jellyfish.soundex)).first()
              Car Name  Number
Car Name                      
A354         Adam Leaf     9.0
C462      Claire-Green     3.0

This can be solved via calculating the Levenshtein distance or even better using the FuzzyWuzzy library

https://www.datacamp.com/community/tutorials/fuzzy-string-python

How to remove similar strings as if they were duplicates from a dataframe?

remove duplicates from dataframe

Remove duplicates from DataFrame view

How to remove duplicates from a dataframe?

Remove duplicates from a dataframe in PySpark

Remove reverse duplicates from dataframe

Remove the similar Duplicates from list of strings

Remove duplicates and similar values from a list in Python?

How to remove duplicates of a single character in string python?

Remove second character in a string in DataFrame

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to remove similar strings as if they were duplicates from a dataframe? remove duplicates from dataframe Remove duplicates from DataFrame view How to remove duplicates from a dataframe? Remove duplicates from a dataframe in PySpark Remove reverse duplicates from dataframe Remove the similar Duplicates from list of strings Remove duplicates and similar values from a list in Python? How to remove duplicates of a single character in string python? Remove second character in a string in DataFrame

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM