There is a package called fuzzy_pandas that can use levenshtein for ratio string matching. With some great examples here
As this exemple:
import pandas as pd
import fuzzy_pandas as fpd
df1 = pd.DataFrame({'Key':['Apple', 'Banana', 'Orange', 'Strawberry']})
df2 = pd.DataFrame({'Key':['Aple', 'Mango', 'Orag', 'Straw', 'Bannanna', 'Berry']})
results = fpd.fuzzy_merge(df1, df2,
left_on='Key',
right_on='Key',
method='levenshtein',
threshold=0.6)
results.head()
So, I don't know if it's possible to display the threshold value in the results.
The output is:
Key Key
0 Apple Aple
1 Banana Bannanna
2 Orange Orag
And I want something like:
Key Key Ratio
0 Apple Aple 0.81
1 Banana Bannanna 0.87
2 Orange Orag 0.78
Maybe with another library
To create a threshold values, you can do the following code:
results['Similarity']= results.apply(lambda x:fuzz.token_set_ratio(x['Key'],x['Key']),axis=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.