[英]How do I compare two columns in Pandas Dataframe to find the match percentages and return a value based on that logic?
I need to compare two columns in a Pandas data frame and fuzzy match.我需要比较 Pandas 数据框中的两列并进行模糊匹配。
If the fuzzy match is above a certain percentage (eg 85), I need to return that percentage, or a string saying "Partial Match"
如果模糊匹配高于某个百分比(例如 85),我需要返回该百分比,或一个字符串表示
"Partial Match"
If it matches fully, return "Full Match"
如果完全匹配,则返回
"Full Match"
If it doesn't match, return "No Match"
如果不匹配,则返回
"No Match"
Solutions I've tried:我尝试过的解决方案:
Attempt #1尝试 #1
conditions = [
(df['one'] == df['two']),fuzz.ratio((df['one'],df['two'])) > 80,
fuzz.ratio((df['one'],df['two'])) <= 80]
choices = ["FULL Match", fuzz.ratio((df['one'],df['two'])),"NO MATCH"]
df['result'] = np.select(condition,choices, default = np.nan)
df['result'] = np.select(condition,choices, default = np.nan)
==================================================================== ================================================== ==================
Attempt #2尝试#2
df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "Partial Match", 'No Match')
df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "部分匹配", '不匹配')
import pandas as pd
import numpy as np
from fuzzywuzzy import fuzz
import os
df = pd.read_csv('data.csv')
>x = fuzz.ratio(df['one'], df['two']) >= 85
df['result'] = np.where(x, "Match", 'No Match')'''
Expected Result预期结果
one two result
0 apple Apple Partial Match
1 banana bannana Partial Match
2 kiwi dragonfruit No Match
3 mango mango Full Match
=================================================================== ================================================== ==================
Error Message:错误信息:
Attempt #1尝试 #1
IndexError: tuple index out of range IndexError:元组索引超出范围
Attempt #2尝试#2
ValueError: The truth value of a Series is ambiguous. ValueError:系列的真值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all().
使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。
尝试将最后两个命令合二为一
df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "Match", 'No Match')
I think this does the trick:我认为这可以解决问题:
from difflib import SequenceMatcher
def similar(a, b):
match_score = SequenceMatcher(None, a, b).ratio()
if match_score == 1.0:
result = "Full Match"
elif match_score >= .85:
result = "Partial Match"
else:
result = "No Match"
return result
df["result"]=df[['one','two']].apply(lambda df: similar(df.one, df.two), axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.