繁体   English   中英

Python - Pandas - 查找两个数据帧之间的匹配项

[英]Python - Pandas - finding matches between two data frames

假设我有 2 个 pandas 数据帧,它们共享相同的列名,如下所示:

    name:       dob:       role:
James Franco   1-1-1980    Actor
Cameron Diaz   4-2-1976    Actor
Jim Carey      12-1-1968   Actor
Miley Cyrus    5-23-1987   Actor


    name:       dob:       role:
50 cent       4-6-1984     Singer
lil baby      12-1-1990    Singer
ghostmane     8-10-1989    Singer
Miley Cyrus   5-23-1987    Singer

假设我想识别具有相同姓名和出生日期的个人,并且存在于两个数据框中(因此,有两个不同的角色)。

我怎样才能做到这一点?

类似于如果一切都存在于 1 dataframe 中,我做了一个 df.groupby(["name", "dob"]).count())

我希望能够识别这些人,打印它们,并计算出现次数。

谢谢

df2=df.append(df1)#append the two dfs
dfnew=df2[df2.duplicated(subset=['name:',"dob:"], keep=False)]#keep all duplicated on the columns you wires to check

好吧,这将为您提供匹配项:

df1.merge(df2, on=["name:","dob:",])

output:

         name:       dob: role:_x role:_y
0  Miley Cyrus  5-23-1987   Actor  Singer

您可以使用外部联接来获取所有结果并根据需要过滤它们:

df1.merge(df2, how="outer", on=["name:","dob:",])

Output:

          name:       dob: role:_x role:_y
0  James Franco   1-1-1980   Actor     NaN
1  Cameron Diaz   4-2-1976   Actor     NaN
2     Jim Carey  12-1-1968   Actor     NaN
3   Miley Cyrus  5-23-1987   Actor  Singer
4       50 cent   4-6-1984     NaN  Singer
5      lil baby  12-1-1990     NaN  Singer
6     ghostmane  8-10-1989     NaN  Singer

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM