繁体   English   中英

如何比较 Pandas Dataframe 中的两列以查找匹配百分比并根据该逻辑返回一个值?

[英]How do I compare two columns in Pandas Dataframe to find the match percentages and return a value based on that logic?

我需要比较 Pandas 数据框中的两列并进行模糊匹配。

如果模糊匹配高于某个百分比(例如 85),我需要返回该百分比,或一个字符串表示"Partial Match"

如果完全匹配,则返回"Full Match"

如果不匹配,则返回"No Match"

我尝试过的解决方案:

尝试 #1

 conditions = [
     (df['one'] == df['two']),fuzz.ratio((df['one'],df['two'])) > 80, 
      fuzz.ratio((df['one'],df['two'])) <= 80]

  choices = ["FULL Match", fuzz.ratio((df['one'],df['two'])),"NO MATCH"]

df['result'] = np.select(condition,choices, default = np.nan)

================================================== ==================

尝试#2

df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "部分匹配", '不匹配')

 import pandas as pd
 import numpy as np
 from fuzzywuzzy import fuzz
 import os


 df = pd.read_csv('data.csv')

 >x = fuzz.ratio(df['one'], df['two']) >= 85

 df['result'] = np.where(x, "Match", 'No Match')'''

预期结果

         one          two    result
 0    apple        Apple     Partial Match
 1  banana       bannana     Partial Match
 2     kiwi  dragonfruit     No Match
 3    mango        mango     Full Match

================================================== ==================

错误信息:

尝试 #1

IndexError:元组索引超出范围

尝试#2

ValueError:系列的真值不明确。 使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

尝试将最后两个命令合二为一

df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "Match", 'No Match')

我认为这可以解决问题:

from difflib import SequenceMatcher

def similar(a, b):
    match_score = SequenceMatcher(None, a, b).ratio()
    if match_score == 1.0:
        result = "Full Match"
    elif match_score >= .85:
        result = "Partial Match"
    else:
        result = "No Match"
    return result

df["result"]=df[['one','two']].apply(lambda df: similar(df.one, df.two), axis=1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM