[英]Python - Pandas - finding matches between two data frames
假设我有 2 个 pandas 数据帧,它们共享相同的列名,如下所示:
name: dob: role:
James Franco 1-1-1980 Actor
Cameron Diaz 4-2-1976 Actor
Jim Carey 12-1-1968 Actor
Miley Cyrus 5-23-1987 Actor
name: dob: role:
50 cent 4-6-1984 Singer
lil baby 12-1-1990 Singer
ghostmane 8-10-1989 Singer
Miley Cyrus 5-23-1987 Singer
假设我想识别具有相同姓名和出生日期的个人,并且存在于两个数据框中(因此,有两个不同的角色)。
我怎样才能做到这一点?
类似于如果一切都存在于 1 dataframe 中,我做了一个 df.groupby(["name", "dob"]).count())
我希望能够识别这些人,打印它们,并计算出现次数。
谢谢
df2=df.append(df1)#append the two dfs
dfnew=df2[df2.duplicated(subset=['name:',"dob:"], keep=False)]#keep all duplicated on the columns you wires to check
好吧,这将为您提供匹配项:
df1.merge(df2, on=["name:","dob:",])
output:
name: dob: role:_x role:_y
0 Miley Cyrus 5-23-1987 Actor Singer
您可以使用外部联接来获取所有结果并根据需要过滤它们:
df1.merge(df2, how="outer", on=["name:","dob:",])
Output:
name: dob: role:_x role:_y
0 James Franco 1-1-1980 Actor NaN
1 Cameron Diaz 4-2-1976 Actor NaN
2 Jim Carey 12-1-1968 Actor NaN
3 Miley Cyrus 5-23-1987 Actor Singer
4 50 cent 4-6-1984 NaN Singer
5 lil baby 12-1-1990 NaN Singer
6 ghostmane 8-10-1989 NaN Singer
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.