[英]How to compare two columns and return value from a third column in Pandas dataframe
[英]How do I compare two columns in Pandas Dataframe to find the match percentages and return a value based on that logic?
我需要比較 Pandas 數據框中的兩列並進行模糊匹配。
如果模糊匹配高於某個百分比(例如 85),我需要返回該百分比,或一個字符串表示"Partial Match"
如果完全匹配,則返回"Full Match"
如果不匹配,則返回"No Match"
我嘗試過的解決方案:
嘗試 #1
conditions = [
(df['one'] == df['two']),fuzz.ratio((df['one'],df['two'])) > 80,
fuzz.ratio((df['one'],df['two'])) <= 80]
choices = ["FULL Match", fuzz.ratio((df['one'],df['two'])),"NO MATCH"]
df['result'] = np.select(condition,choices, default = np.nan)
================================================== ==================
嘗試#2
df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "部分匹配", '不匹配')
import pandas as pd
import numpy as np
from fuzzywuzzy import fuzz
import os
df = pd.read_csv('data.csv')
>x = fuzz.ratio(df['one'], df['two']) >= 85
df['result'] = np.where(x, "Match", 'No Match')'''
預期結果
one two result
0 apple Apple Partial Match
1 banana bannana Partial Match
2 kiwi dragonfruit No Match
3 mango mango Full Match
================================================== ==================
錯誤信息:
嘗試 #1
IndexError:元組索引超出范圍
嘗試#2
ValueError:系列的真值不明確。 使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。
嘗試將最后兩個命令合二為一
df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "Match", 'No Match')
我認為這可以解決問題:
from difflib import SequenceMatcher
def similar(a, b):
match_score = SequenceMatcher(None, a, b).ratio()
if match_score == 1.0:
result = "Full Match"
elif match_score >= .85:
result = "Partial Match"
else:
result = "No Match"
return result
df["result"]=df[['one','two']].apply(lambda df: similar(df.one, df.two), axis=1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.