Suppose I have 2 pandas data frames, both sharing the same column names, like this:
name: dob: role:
James Franco 1-1-1980 Actor
Cameron Diaz 4-2-1976 Actor
Jim Carey 12-1-1968 Actor
Miley Cyrus 5-23-1987 Actor
name: dob: role:
50 cent 4-6-1984 Singer
lil baby 12-1-1990 Singer
ghostmane 8-10-1989 Singer
Miley Cyrus 5-23-1987 Singer
And say I wanted to identify individuals who share the same name and dob, and exist in both dataframes (and thus, have 2 different roles).
How can I do this?
similar to if everything existed in 1 dataframe, and I did a df.groupby(["name", "dob"]).count())
I would like to be able to identify these individuals, print them, and count the number of occurrences.
Thank you
df2=df.append(df1)#append the two dfs
dfnew=df2[df2.duplicated(subset=['name:',"dob:"], keep=False)]#keep all duplicated on the columns you wires to check
Well,this will give you just the matches:
df1.merge(df2, on=["name:","dob:",])
output:
name: dob: role:_x role:_y
0 Miley Cyrus 5-23-1987 Actor Singer
You can use an outer join to get all the results and filter them as you see fit:
df1.merge(df2, how="outer", on=["name:","dob:",])
Output:
name: dob: role:_x role:_y
0 James Franco 1-1-1980 Actor NaN
1 Cameron Diaz 4-2-1976 Actor NaN
2 Jim Carey 12-1-1968 Actor NaN
3 Miley Cyrus 5-23-1987 Actor Singer
4 50 cent 4-6-1984 NaN Singer
5 lil baby 12-1-1990 NaN Singer
6 ghostmane 8-10-1989 NaN Singer
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.