簡體   English   中英

將熊貓數據框名稱列與另一個數據框的列匹配?

[英]Match pandas dataframe name columns to another dataframe's columns?

我對 Python 很陌生。 如何將一個文本數據框與另一個文本數據框匹配? (如果我問錯了,請編輯這個問題)

例如給定這個輸入數據:

 df1 =
          id  Names 
        0 123 Simpson J.
        1 456 Snoop Dogg

 df2 =
            Names 
         0  John Simpson
         1  Snoop Dogg
         2  S. Dogg
         3  Mr Dogg

有沒有一種方法可以找到(可能使用findallmatch ,或任何 python 包),以便我可以生成帶有 id 的名稱出現的次數,這幾乎就像這個結果:

result = 
              id  Names_appeared 
            0 123   1 
            1 456   3

尋找一個簡短的解釋和一些 URL 來幫助我理解。

這是 IanS 建議的使用模糊模糊的示例:

import pandas as pd
from fuzzywuzzy import fuzz


def fuzz_count(shortList, longList, minScore):
    """ Count fuzz ratios greater than or equal to a minimum score. """
    results = []
    for s in shortList:
        scores = [fuzz.ratio(s, l) for l in longList]
        count = sum(x >= minScore for x in scores)
        results.append(count)
    return results


data1 = {'id': pd.Series([123, 456]),
         'Names': pd.Series(['Simpson J.', 'Snoop Dogg'])}
data2 = {'Names': pd.Series(['John Simpson', 'Snoop Dogg', 'S. Dogg', 'Mr Dogg'])}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

result = pd.DataFrame()
result['id'] = df1['id']
counts = fuzz_count(df1['Names'], df2['Names'], minScore=60)  # [1, 2]
result['Names_appeared'] = counts

print(result)  #     id  Names_appeared
               # 0  123               1
               # 1  456               2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM