在 Python 中的两个列表之间执行匹配

Question

我有两张桌子。 我想比较两列并希望得到匹配的行数和行数。 如何使用 Python 获得预期结果。 df1：

姓名	分数	年
拍	82	1990
克里斯	38	1993
拍	92	1994
诺里斯	88	1997
麻省理工学院	62	1999
陈	58	1996

df2:

申请人
拍
克里斯
见面

预期结果

申请人	匹配（是/否）	匹配行参考	数数
拍	是	1,3	2
克里斯	是	2	1
见面	ñ	不适用	0

Answer 1

基于Pandas外合并的方法

df1 & df2 的外部合并创建所有需要的行
按申请人分组，以便我们汇总计数
使用 function 聚合并生成所需 output 行的值
要在合并后保留 df1 的索引，请使用方法 from 在合并后保留索引

代码

import pandas as pd
import numpy as np

def process(df1, df2):
    ' Overall function for generating desired output '
    
    def create_result(df, columns = ["Match (Y/N)", "Matched Row reference", "Count"]):
        '''
            Creates the desired columns of df2

            Input:
                df      - Dataframe from groupby
                columns - column names for df2
            Output:
                Pandas Series corresponding to row in df2
        '''
        cnt = df['Name'].count()   # Number of items in group
        if cnt > 0:
            # Convert index to comma delimited list, numbered from 1 (i.e. int(x) + 1)
            indexes = ','.join(str(int(x) + 1) for x in df.index.to_list())
        else:
            indexes = "NA"   # empty dataframe

        lst = ["Y" if cnt > 0 else 'N', 
                indexes,
                df.shape[0] if cnt > 0 else 0]

        return pd.Series(lst, index = columns)

    # Merge df1 with df2 but
    # add method from [to keep index after merge](https://stackoverflow.com/questions/11976503/how-to-keep-index-when-using-pandas-merge/11982843#11982843)
    # to have the index of df1 in the merge result
    return (df1
            .reset_index()
            .merge(df2, left_on = "Name", right_on = 'Applicant', how = "outer")
            .set_index('index')
            .groupby(['Applicant'])
            .apply(lambda grp_df: create_result(grp_df)))

用法

from io import StringIO

s = '''Name Score   Year
Pat 82  1990
Chris   38  1993
Pat 92  1994
Noris   88  1997
Mit 62  1999
Chen    58  1996'''

df1 = pd.read_csv(StringIO(s), sep = '\t', engine = 'python')

s = '''Applicant
Pat
Chris
Meet'''

df2 = pd.read_csv(StringIO(s), sep = '\t', engine = 'python')

from pprint import pprint as pp
pp(process(df1, df2))            # process and pretty print result

Output

                   Match (Y/N) Matched Row reference  Count
Applicant                                         
Chris               Y                     2           1
Meet                N                    NA           0
Pat                 Y                   1,3           2

Answer 2

我会为此使用 numpy 和 pandas 。 因为我相信 Pandas 是处理海量数据的优秀库。 尽管您没有大量数据，但我仍然建议您使用 pandas。

有关 pandas https 的信息：//pandas.pydata.org/

您可以使用 pandas 创建列表文件

data = {'Name': ListForName, 
    'Score': ListForScore, 
    'Year': ListForScore}

有关创建列表的更多信息。 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

我会使用基本的 for 循环进行匹配。 例如。

match = 0
for i in range(0, FirstList):
    for j in range(0, SecondList):
        if(FirstList['Colunm'].iloc[i] == SecondList['Colunm'].iloc[j)):
             match += 1

在 Python 中的两个列表之间执行匹配

问题描述

2 个解决方案

解决方案1
0 2022-09-06 23:36:41

解决方案2
0 2022-09-06 23:50:56

在 Python 中的两个列表之间执行匹配

问题描述

2 个解决方案

解决方案1 0 2022-09-06 23:36:41

解决方案2 0 2022-09-06 23:50:56

解决方案1
0 2022-09-06 23:36:41

解决方案2
0 2022-09-06 23:50:56