简体   繁体   English

将熊猫数据框名称列与另一个数据框的列匹配?

[英]Match pandas dataframe name columns to another dataframe's columns?

I'm very new to Python.我对 Python 很陌生。 How can I match one text dataframe to another?如何将一个文本数据框与另一个文本数据框匹配? (kindly please edit this question if I ask this wrongly) (如果我问错了,请编辑这个问题)

For example given this input data:例如给定这个输入数据:

 df1 =
          id  Names 
        0 123 Simpson J.
        1 456 Snoop Dogg

 df2 =
            Names 
         0  John Simpson
         1  Snoop Dogg
         2  S. Dogg
         3  Mr Dogg

Is there a way I could find (maybe using findall or match , or any python packages) so that I could produce how many times the names with the id has appeared which almost like this result:有没有一种方法可以找到(可能使用findallmatch ,或任何 python 包),以便我可以生成带有 id 的名称出现的次数,这几乎就像这个结果:

result = 
              id  Names_appeared 
            0 123   1 
            1 456   3

Looking for a brief explanation and some URL to help me understand.寻找一个简短的解释和一些 URL 来帮助我理解。

Here's an example using fuzzywuzzy as IanS suggested:这是 IanS 建议的使用模糊模糊的示例:

import pandas as pd
from fuzzywuzzy import fuzz


def fuzz_count(shortList, longList, minScore):
    """ Count fuzz ratios greater than or equal to a minimum score. """
    results = []
    for s in shortList:
        scores = [fuzz.ratio(s, l) for l in longList]
        count = sum(x >= minScore for x in scores)
        results.append(count)
    return results


data1 = {'id': pd.Series([123, 456]),
         'Names': pd.Series(['Simpson J.', 'Snoop Dogg'])}
data2 = {'Names': pd.Series(['John Simpson', 'Snoop Dogg', 'S. Dogg', 'Mr Dogg'])}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

result = pd.DataFrame()
result['id'] = df1['id']
counts = fuzz_count(df1['Names'], df2['Names'], minScore=60)  # [1, 2]
result['Names_appeared'] = counts

print(result)  #     id  Names_appeared
               # 0  123               1
               # 1  456               2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 匹配列 pandas Dataframe - Match columns pandas Dataframe pandas:根据另一个数据框的列选择数据框列 - pandas: Select dataframe columns based on another dataframe's columns 将 Pandas Dataframe 的列的名称重命名为另一个 Pandas ZC699575A5E8AFD9E22A7ECC8CAB11AFB30DC6784AZ 的值 - Rename the name of the columns of a Pandas Dataframe by the values of another Pandas Dataframe 将 DataFrame 中的多个列中的 ID 匹配到另一个 DataFrame - Match ID's in Multiple Columns in a DataFrame to Another DataFrame 根据三列将一个Pandas数据框中的行与另一个数据框中的行进行匹配 - Match rows in one Pandas dataframe to another based on three columns Pandas 将 dataframe 的列重命名为另一个 dataframe 的值,如果两个 Z6A8064B53C47945557755705 列的值匹配 - Pandas rename column of dataframe to value of another dataframe if values of two dataframe columns match 设置DataFrame的列为熊猫中另一个列的总和 - Set columns of DataFrame to sum of columns of another in pandas 如何将同名的熊猫列替换到另一个数据框中? - How to replace pandas columns with the same name in to another dataframe? Match one column's dataframe to another dataframe with a series of columns and extracting the columns header - Python - Match one column's dataframe to another dataframe with a series of columns and extracting the columns header - Python 有条件地将一个 Pandas dataframe 的列插入另一个 dataframe 的列 - Conditionally insert columns of one Pandas dataframe into columns of another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM