簡體 English 中英

從 dataframe 中刪除相似的字符串重復項

[英]Remove similar character string duplicates from a dataframe

原文 2019-10-17 14:13:58 8 2 python/ pandas/ dataframe/ data-cleaning

我有 df 目前看起來像這樣：

Car Name      Number
Adam Leaf     9
Adamm Leaf    9
Adam Lea      NaN
Adam-Leaf     NaN
Adam/Leaf     9
Claire-Green  NaN
Cliare Green  3
Claire Green  3
Claire Gren   NaN
Claire/Green  3

我正在嘗試刪除變化以實現這樣的目標

Car Name      Number
Adam Leaf     9
Claire Green  3

2 個解決方案

這是jellyfish的一種方法

import jellyfish

s=df.groupby(df['Car Name'].apply(jellyfish.soundex)).first()
              Car Name  Number
Car Name                      
A354         Adam Leaf     9.0
C462      Claire-Green     3.0

這可以通過計算 Levenshtein 距離甚至更好地使用 FuzzyWuzzy 庫來解決

https://www.datacamp.com/community/tutorials/fuzzy-string-python

如何刪除相似的字符串，就好像它們是 dataframe 中的重復字符串一樣？

[英]How to remove similar strings as if they were duplicates from a dataframe?

從數據框中刪除重復項

[英]remove duplicates from dataframe

從DataFrame視圖中刪除重復項

[英]Remove duplicates from DataFrame view

如何從數據框中刪除重復項？

[英]How to remove duplicates from a dataframe?

從 PySpark 中的數據框中刪除重復項

[英]Remove duplicates from a dataframe in PySpark

從 dataframe 中刪除反向重復項

[英]Remove reverse duplicates from dataframe

從字符串列表中刪除相似的重復項

[英]Remove the similar Duplicates from list of strings

從Python列表中刪除重復項和類似值？

[英]Remove duplicates and similar values from a list in Python?

如何刪除字符串python中單個字符的重復項？

[英]How to remove duplicates of a single character in string python?

刪除DataFrame中字符串中的第二個字符

[英]Remove second character in a string in DataFrame

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何刪除相似的字符串，就好像它們是 dataframe 中的重復字符串一樣？從數據框中刪除重復項從DataFrame視圖中刪除重復項如何從數據框中刪除重復項？從 PySpark 中的數據框中刪除重復項從 dataframe 中刪除反向重復項從字符串列表中刪除相似的重復項從Python列表中刪除重復項和類似值？如何刪除字符串python中單個字符的重復項？刪除DataFrame中字符串中的第二個字符

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM