简体   繁体   English

在 Python 中的两个列表之间执行匹配

[英]Perform matching in between Two lists in Python

I have two Tables.我有两张桌子。 I want to compare two columns and want to get matches row counts and row numbers.我想比较两列并希望得到匹配的行数和行数。 How can I get the expected result using Python.如何使用 Python 获得预期结果。 df1: df1:

Name姓名 Score分数 Year
Pat 82 82 1990 1990
Chris克里斯 38 38 1993 1993
Pat 92 92 1994 1994
Noris诺里斯 88 88 1997 1997
Mit麻省理工学院 62 62 1999 1999
Chen 58 58 1996 1996

df2: df2:

Applicant申请人
Pat
Chris克里斯
Meet见面

Expected result预期结果

Applicant申请人 Match (Y/N)匹配(是/否) Matched Row reference匹配行参考 Count数数
Pat Y 1,3 1,3 2 2
Chris克里斯 Y 2 2 1 1
Meet见面 N ñ NA不适用 0 0

Approach based on Pandas outer merge基于Pandas外合并的方法

  • Outer merge of df1 & df2 creates all required rows df1 & df2 的外部合并创建所有需要的行
  • Group by Applicant so we can aggregate count按申请人分组,以便我们汇总计数
  • Use a function to aggregate and produce values for desired output rows使用 function 聚合并生成所需 output 行的值
  • To keep index of df1 after merge, use method from to keep index after merge要在合并后保留 df1 的索引,请使用方法 from 在合并后保留索引

Code代码

import pandas as pd
import numpy as np

def process(df1, df2):
    ' Overall function for generating desired output '
    
    def create_result(df, columns = ["Match (Y/N)", "Matched Row reference", "Count"]):
        '''
            Creates the desired columns of df2

            Input:
                df      - Dataframe from groupby
                columns - column names for df2
            Output:
                Pandas Series corresponding to row in df2
        '''
        cnt = df['Name'].count()   # Number of items in group
        if cnt > 0:
            # Convert index to comma delimited list, numbered from 1 (i.e. int(x) + 1)
            indexes = ','.join(str(int(x) + 1) for x in df.index.to_list())
        else:
            indexes = "NA"   # empty dataframe

        lst = ["Y" if cnt > 0 else 'N', 
                indexes,
                df.shape[0] if cnt > 0 else 0]

        return pd.Series(lst, index = columns)

    # Merge df1 with df2 but
    # add method from [to keep index after merge](https://stackoverflow.com/questions/11976503/how-to-keep-index-when-using-pandas-merge/11982843#11982843)
    # to have the index of df1 in the merge result
    return (df1
            .reset_index()
            .merge(df2, left_on = "Name", right_on = 'Applicant', how = "outer")
            .set_index('index')
            .groupby(['Applicant'])
            .apply(lambda grp_df: create_result(grp_df)))
        

Usage用法

from io import StringIO

s = '''Name Score   Year
Pat 82  1990
Chris   38  1993
Pat 92  1994
Noris   88  1997
Mit 62  1999
Chen    58  1996'''

df1 = pd.read_csv(StringIO(s), sep = '\t', engine = 'python')

s = '''Applicant
Pat
Chris
Meet'''

df2 = pd.read_csv(StringIO(s), sep = '\t', engine = 'python')

from pprint import pprint as pp
pp(process(df1, df2))            # process and pretty print result

Output Output

                   Match (Y/N) Matched Row reference  Count
Applicant                                         
Chris               Y                     2           1
Meet                N                    NA           0
Pat                 Y                   1,3           2

I would use numpy and pandas for this.我会为此使用 numpy 和 pandas 。 Because I belive that Pandas is the great libraries for dealing with huge data.因为我相信 Pandas 是处理海量数据的优秀库。 Although you do not great number of data, I would still recommend you to use pandas.尽管您没有大量数据,但我仍然建议您使用 pandas。

For information about pandas https://pandas.pydata.org/有关 pandas https 的信息://pandas.pydata.org/

You are able to create list file with pandas您可以使用 pandas 创建列表文件

data = {'Name': ListForName, 
    'Score': ListForScore, 
    'Year': ListForScore}

For more information about creating a list.有关创建列表的更多信息。 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

I would use a basic for loop for matching.我会使用基本的 for 循环进行匹配。 For example.例如。

match = 0
for i in range(0, FirstList):
    for j in range(0, SecondList):
        if(FirstList['Colunm'].iloc[i] == SecondList['Colunm'].iloc[j)):
             match += 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM