[英]Python - Pandas - finding matches between two data frames
假設我有 2 個 pandas 數據幀,它們共享相同的列名,如下所示:
name: dob: role:
James Franco 1-1-1980 Actor
Cameron Diaz 4-2-1976 Actor
Jim Carey 12-1-1968 Actor
Miley Cyrus 5-23-1987 Actor
name: dob: role:
50 cent 4-6-1984 Singer
lil baby 12-1-1990 Singer
ghostmane 8-10-1989 Singer
Miley Cyrus 5-23-1987 Singer
假設我想識別具有相同姓名和出生日期的個人,並且存在於兩個數據框中(因此,有兩個不同的角色)。
我怎樣才能做到這一點?
類似於如果一切都存在於 1 dataframe 中,我做了一個 df.groupby(["name", "dob"]).count())
我希望能夠識別這些人,打印它們,並計算出現次數。
謝謝
df2=df.append(df1)#append the two dfs
dfnew=df2[df2.duplicated(subset=['name:',"dob:"], keep=False)]#keep all duplicated on the columns you wires to check
好吧,這將為您提供匹配項:
df1.merge(df2, on=["name:","dob:",])
output:
name: dob: role:_x role:_y
0 Miley Cyrus 5-23-1987 Actor Singer
您可以使用外部聯接來獲取所有結果並根據需要過濾它們:
df1.merge(df2, how="outer", on=["name:","dob:",])
Output:
name: dob: role:_x role:_y
0 James Franco 1-1-1980 Actor NaN
1 Cameron Diaz 4-2-1976 Actor NaN
2 Jim Carey 12-1-1968 Actor NaN
3 Miley Cyrus 5-23-1987 Actor Singer
4 50 cent 4-6-1984 NaN Singer
5 lil baby 12-1-1990 NaN Singer
6 ghostmane 8-10-1989 NaN Singer
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.