简体   繁体   English

Python - Pandas - 查找两个数据帧之间的匹配项

[英]Python - Pandas - finding matches between two data frames

Suppose I have 2 pandas data frames, both sharing the same column names, like this:假设我有 2 个 pandas 数据帧,它们共享相同的列名,如下所示:

    name:       dob:       role:
James Franco   1-1-1980    Actor
Cameron Diaz   4-2-1976    Actor
Jim Carey      12-1-1968   Actor
Miley Cyrus    5-23-1987   Actor


    name:       dob:       role:
50 cent       4-6-1984     Singer
lil baby      12-1-1990    Singer
ghostmane     8-10-1989    Singer
Miley Cyrus   5-23-1987    Singer

And say I wanted to identify individuals who share the same name and dob, and exist in both dataframes (and thus, have 2 different roles).假设我想识别具有相同姓名和出生日期的个人,并且存在于两个数据框中(因此,有两个不同的角色)。

How can I do this?我怎样才能做到这一点?

similar to if everything existed in 1 dataframe, and I did a df.groupby(["name", "dob"]).count())类似于如果一切都存在于 1 dataframe 中,我做了一个 df.groupby(["name", "dob"]).count())

I would like to be able to identify these individuals, print them, and count the number of occurrences.我希望能够识别这些人,打印它们,并计算出现次数。

Thank you谢谢

df2=df.append(df1)#append the two dfs
dfnew=df2[df2.duplicated(subset=['name:',"dob:"], keep=False)]#keep all duplicated on the columns you wires to check

Well,this will give you just the matches:好吧,这将为您提供匹配项:

df1.merge(df2, on=["name:","dob:",])

output: output:

         name:       dob: role:_x role:_y
0  Miley Cyrus  5-23-1987   Actor  Singer

You can use an outer join to get all the results and filter them as you see fit:您可以使用外部联接来获取所有结果并根据需要过滤它们:

df1.merge(df2, how="outer", on=["name:","dob:",])

Output: Output:

          name:       dob: role:_x role:_y
0  James Franco   1-1-1980   Actor     NaN
1  Cameron Diaz   4-2-1976   Actor     NaN
2     Jim Carey  12-1-1968   Actor     NaN
3   Miley Cyrus  5-23-1987   Actor  Singer
4       50 cent   4-6-1984     NaN  Singer
5      lil baby  12-1-1990     NaN  Singer
6     ghostmane  8-10-1989     NaN  Singer

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM