簡體   English   中英

如何比較 Pandas Dataframe 中的兩列以查找匹配百分比並根據該邏輯返回一個值?

[英]How do I compare two columns in Pandas Dataframe to find the match percentages and return a value based on that logic?

我需要比較 Pandas 數據框中的兩列並進行模糊匹配。

如果模糊匹配高於某個百分比(例如 85),我需要返回該百分比,或一個字符串表示"Partial Match"

如果完全匹配,則返回"Full Match"

如果不匹配,則返回"No Match"

我嘗試過的解決方案:

嘗試 #1

 conditions = [
     (df['one'] == df['two']),fuzz.ratio((df['one'],df['two'])) > 80, 
      fuzz.ratio((df['one'],df['two'])) <= 80]

  choices = ["FULL Match", fuzz.ratio((df['one'],df['two'])),"NO MATCH"]

df['result'] = np.select(condition,choices, default = np.nan)

================================================== ==================

嘗試#2

df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "部分匹配", '不匹配')

 import pandas as pd
 import numpy as np
 from fuzzywuzzy import fuzz
 import os


 df = pd.read_csv('data.csv')

 >x = fuzz.ratio(df['one'], df['two']) >= 85

 df['result'] = np.where(x, "Match", 'No Match')'''

預期結果

         one          two    result
 0    apple        Apple     Partial Match
 1  banana       bannana     Partial Match
 2     kiwi  dragonfruit     No Match
 3    mango        mango     Full Match

================================================== ==================

錯誤信息:

嘗試 #1

IndexError:元組索引超出范圍

嘗試#2

ValueError:系列的真值不明確。 使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

嘗試將最后兩個命令合二為一

df['result'] = np.where(fuzz.ratio(df['one'], df['two']) >= 85, "Match", 'No Match')

我認為這可以解決問題:

from difflib import SequenceMatcher

def similar(a, b):
    match_score = SequenceMatcher(None, a, b).ratio()
    if match_score == 1.0:
        result = "Full Match"
    elif match_score >= .85:
        result = "Partial Match"
    else:
        result = "No Match"
    return result

df["result"]=df[['one','two']].apply(lambda df: similar(df.one, df.two), axis=1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM