简体   繁体   中英

Python - Pandas - finding matches between two data frames

Suppose I have 2 pandas data frames, both sharing the same column names, like this:

    name:       dob:       role:
James Franco   1-1-1980    Actor
Cameron Diaz   4-2-1976    Actor
Jim Carey      12-1-1968   Actor
Miley Cyrus    5-23-1987   Actor


    name:       dob:       role:
50 cent       4-6-1984     Singer
lil baby      12-1-1990    Singer
ghostmane     8-10-1989    Singer
Miley Cyrus   5-23-1987    Singer

And say I wanted to identify individuals who share the same name and dob, and exist in both dataframes (and thus, have 2 different roles).

How can I do this?

similar to if everything existed in 1 dataframe, and I did a df.groupby(["name", "dob"]).count())

I would like to be able to identify these individuals, print them, and count the number of occurrences.

Thank you

df2=df.append(df1)#append the two dfs
dfnew=df2[df2.duplicated(subset=['name:',"dob:"], keep=False)]#keep all duplicated on the columns you wires to check

Well,this will give you just the matches:

df1.merge(df2, on=["name:","dob:",])

output:

         name:       dob: role:_x role:_y
0  Miley Cyrus  5-23-1987   Actor  Singer

You can use an outer join to get all the results and filter them as you see fit:

df1.merge(df2, how="outer", on=["name:","dob:",])

Output:

          name:       dob: role:_x role:_y
0  James Franco   1-1-1980   Actor     NaN
1  Cameron Diaz   4-2-1976   Actor     NaN
2     Jim Carey  12-1-1968   Actor     NaN
3   Miley Cyrus  5-23-1987   Actor  Singer
4       50 cent   4-6-1984     NaN  Singer
5      lil baby  12-1-1990     NaN  Singer
6     ghostmane  8-10-1989     NaN  Singer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM