[英]How to compare two columns and return value from a third column in Pandas dataframe
[英]How do I compare two columns in Pandas Dataframe to find the match percentages and return a value based on that logic?
我需要比较 Pandas 数据框中的两列并进行模糊匹配。
如果模糊匹配高于某个百分比(例如 85),我需要返回该百分比,或一个字符串表示"Partial Match"
如果完全匹配,则返回"Full Match"
如果不匹配,则返回"No Match"
我尝试过的解决方案:
尝试 #1
conditions = [
(df['one'] == df['two']),fuzz.ratio((df['one'],df['two'])) > 80,
fuzz.ratio((df['one'],df['two'])) <= 80]
choices = ["FULL Match", fuzz.ratio((df['one'],df['two'])),"NO MATCH"]
df['result'] = np.select(condition,choices, default = np.nan)
================================================== ==================
尝试#2
df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "部分匹配", '不匹配')
import pandas as pd
import numpy as np
from fuzzywuzzy import fuzz
import os
df = pd.read_csv('data.csv')
>x = fuzz.ratio(df['one'], df['two']) >= 85
df['result'] = np.where(x, "Match", 'No Match')'''
预期结果
one two result
0 apple Apple Partial Match
1 banana bannana Partial Match
2 kiwi dragonfruit No Match
3 mango mango Full Match
================================================== ==================
错误信息:
尝试 #1
IndexError:元组索引超出范围
尝试#2
ValueError:系列的真值不明确。 使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。
尝试将最后两个命令合二为一
df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "Match", 'No Match')
我认为这可以解决问题:
from difflib import SequenceMatcher
def similar(a, b):
match_score = SequenceMatcher(None, a, b).ratio()
if match_score == 1.0:
result = "Full Match"
elif match_score >= .85:
result = "Partial Match"
else:
result = "No Match"
return result
df["result"]=df[['one','two']].apply(lambda df: similar(df.one, df.two), axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.