在 Python 中的两个列表之间执行匹配

Question

I have two Tables.我有两张桌子。 I want to compare two columns and want to get matches row counts and row numbers.我想比较两列并希望得到匹配的行数和行数。 How can I get the expected result using Python.如何使用 Python 获得预期结果。 df1: df1：

Name姓名	Score分数	Year年
Pat拍	82 82	1990 1990
Chris克里斯	38 38	1993 1993
Pat拍	92 92	1994 1994
Noris诺里斯	88 88	1997 1997
Mit麻省理工学院	62 62	1999 1999
Chen陈	58 58	1996 1996

df2: df2:

Applicant申请人
Pat拍
Chris克里斯
Meet见面

Expected result预期结果

Applicant申请人	Match (Y/N)匹配（是/否）	Matched Row reference匹配行参考	Count数数
Pat拍	Y是	1,3 1,3	2 2
Chris克里斯	Y是	2 2	1 1
Meet见面	N ñ	NA不适用	0 0

Answer 1

Approach based on Pandas outer merge基于Pandas外合并的方法

Outer merge of df1 & df2 creates all required rows df1 & df2 的外部合并创建所有需要的行
Group by Applicant so we can aggregate count按申请人分组，以便我们汇总计数
Use a function to aggregate and produce values for desired output rows使用 function 聚合并生成所需 output 行的值
To keep index of df1 after merge, use method from to keep index after merge要在合并后保留 df1 的索引，请使用方法 from 在合并后保留索引

Code代码

import pandas as pd
import numpy as np

def process(df1, df2):
    ' Overall function for generating desired output '
    
    def create_result(df, columns = ["Match (Y/N)", "Matched Row reference", "Count"]):
        '''
            Creates the desired columns of df2

            Input:
                df      - Dataframe from groupby
                columns - column names for df2
            Output:
                Pandas Series corresponding to row in df2
        '''
        cnt = df['Name'].count()   # Number of items in group
        if cnt > 0:
            # Convert index to comma delimited list, numbered from 1 (i.e. int(x) + 1)
            indexes = ','.join(str(int(x) + 1) for x in df.index.to_list())
        else:
            indexes = "NA"   # empty dataframe

        lst = ["Y" if cnt > 0 else 'N', 
                indexes,
                df.shape[0] if cnt > 0 else 0]

        return pd.Series(lst, index = columns)

    # Merge df1 with df2 but
    # add method from [to keep index after merge](https://stackoverflow.com/questions/11976503/how-to-keep-index-when-using-pandas-merge/11982843#11982843)
    # to have the index of df1 in the merge result
    return (df1
            .reset_index()
            .merge(df2, left_on = "Name", right_on = 'Applicant', how = "outer")
            .set_index('index')
            .groupby(['Applicant'])
            .apply(lambda grp_df: create_result(grp_df)))

Usage用法

from io import StringIO

s = '''Name Score   Year
Pat 82  1990
Chris   38  1993
Pat 92  1994
Noris   88  1997
Mit 62  1999
Chen    58  1996'''

df1 = pd.read_csv(StringIO(s), sep = '\t', engine = 'python')

s = '''Applicant
Pat
Chris
Meet'''

df2 = pd.read_csv(StringIO(s), sep = '\t', engine = 'python')

from pprint import pprint as pp
pp(process(df1, df2))            # process and pretty print result

Output Output

                   Match (Y/N) Matched Row reference  Count
Applicant                                         
Chris               Y                     2           1
Meet                N                    NA           0
Pat                 Y                   1,3           2

Answer 2

I would use numpy and pandas for this.我会为此使用 numpy 和 pandas 。 Because I belive that Pandas is the great libraries for dealing with huge data.因为我相信 Pandas 是处理海量数据的优秀库。 Although you do not great number of data, I would still recommend you to use pandas.尽管您没有大量数据，但我仍然建议您使用 pandas。

For information about pandas https://pandas.pydata.org/有关 pandas https 的信息：//pandas.pydata.org/

You are able to create list file with pandas您可以使用 pandas 创建列表文件

data = {'Name': ListForName, 
    'Score': ListForScore, 
    'Year': ListForScore}

For more information about creating a list.有关创建列表的更多信息。 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

I would use a basic for loop for matching.我会使用基本的 for 循环进行匹配。 For example.例如。

match = 0
for i in range(0, FirstList):
    for j in range(0, SecondList):
        if(FirstList['Colunm'].iloc[i] == SecondList['Colunm'].iloc[j)):
             match += 1

在 Python 中的两个列表之间执行匹配

问题描述

2 个解决方案

解决方案1
0 2022-09-06 23:36:41

解决方案2
0 2022-09-06 23:50:56

在 Python 中的两个列表之间执行匹配

问题描述

2 个解决方案

解决方案1 0 2022-09-06 23:36:41

解决方案2 0 2022-09-06 23:50:56

解决方案1
0 2022-09-06 23:36:41

解决方案2
0 2022-09-06 23:50:56