简体   繁体   English

如何比较 Pandas Dataframe 中的两列以查找匹配百分比并根据该逻辑返回一个值?

[英]How do I compare two columns in Pandas Dataframe to find the match percentages and return a value based on that logic?

I need to compare two columns in a Pandas data frame and fuzzy match.我需要比较 Pandas 数据框中的两列并进行模糊匹配。

If the fuzzy match is above a certain percentage (eg 85), I need to return that percentage, or a string saying "Partial Match"如果模糊匹配高于某个百分比(例如 85),我需要返回该百分比,或一个字符串表示"Partial Match"

If it matches fully, return "Full Match"如果完全匹配,则返回"Full Match"

If it doesn't match, return "No Match"如果不匹配,则返回"No Match"

Solutions I've tried:我尝试过的解决方案:

Attempt #1尝试 #1

 conditions = [
     (df['one'] == df['two']),fuzz.ratio((df['one'],df['two'])) > 80, 
      fuzz.ratio((df['one'],df['two'])) <= 80]

  choices = ["FULL Match", fuzz.ratio((df['one'],df['two'])),"NO MATCH"]

df['result'] = np.select(condition,choices, default = np.nan) df['result'] = np.select(condition,choices, default = np.nan)

==================================================================== ================================================== ==================

Attempt #2尝试#2

df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "Partial Match", 'No Match') df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "部分匹配", '不匹配')

 import pandas as pd
 import numpy as np
 from fuzzywuzzy import fuzz
 import os


 df = pd.read_csv('data.csv')

 >x = fuzz.ratio(df['one'], df['two']) >= 85

 df['result'] = np.where(x, "Match", 'No Match')'''

Expected Result预期结果

         one          two    result
 0    apple        Apple     Partial Match
 1  banana       bannana     Partial Match
 2     kiwi  dragonfruit     No Match
 3    mango        mango     Full Match

=================================================================== ================================================== ==================

Error Message:错误信息:

Attempt #1尝试 #1

IndexError: tuple index out of range IndexError:元组索引超出范围

Attempt #2尝试#2

ValueError: The truth value of a Series is ambiguous. ValueError:系列的真值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all().使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

尝试将最后两个命令合二为一

df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "Match", 'No Match')

I think this does the trick:我认为这可以解决问题:

from difflib import SequenceMatcher

def similar(a, b):
    match_score = SequenceMatcher(None, a, b).ratio()
    if match_score == 1.0:
        result = "Full Match"
    elif match_score >= .85:
        result = "Partial Match"
    else:
        result = "No Match"
    return result

df["result"]=df[['one','two']].apply(lambda df: similar(df.one, df.two), axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何比较两列并从第三列返回值 Pandas dataframe - How to compare two columns and return value from a third column in Pandas dataframe 如何比较两个 pandas dataframe 行并返回值? - How to compare two pandas dataframe rows and return the value? 如何使用特定列系统地比较两个 Pandas 数据帧中的所有行并返回差异? - How do I systematically compare all rows in two Pandas dataframes using specific columns and return the differences? 如何字符串比较熊猫数据框中的两列? - how to string compare two columns in pandas dataframe? 返回数据框中两列的最大值(Pandas) - Return the maximum value of two columns in a dataframe (Pandas) 熊猫:如何检查同一数据框中各列之间的值匹配? - Pandas: How do I check for value match between columns in same dataframe? 如何在 pandas dataframe 的每一行中的选定列中找到两个最低值? - How do I find the two lowest values across selected columns in each row of a pandas dataframe? 如何在python熊猫中找到带有多索引的至少两个数据框列? - How do i find the minimum of two dataframe columns with multi-indices in python pandas? 如何比较 pandas dataframe 的列? - How can I compare columns of a pandas dataframe? 如何使用 Pandas 根据数据框中另一列的值获取 2 列的总和 - How do I use Pandas to get the sum of 2 columns based on the value of another column in a dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM