I am trying to merge two data frames and eliminate dupes.
Here is DF#1:
import pandas as pd
data1 = {'id':['168'],'group_id':['360002136432'],'ticket_form_id':['360000159712']}
df1 = pd.DataFrame(data1)
print(df1)
Here is DF #2
data2 = {'id':['362936613051','362936613051','362936613051'],'ticket_id':['168','168','168']}
df2 = pd.DataFrame(data2)
print(df2)
I am trying to merge, or consolidate, DF#1 and DF#2, so it looks like this.
id group_id ticket_form_id ID
168 360002136432 360000159712 362936613051
It would be some kind of inner join (I think) between DF#1.id and DF#2.ticket_id, but I keep getting a bunch of dupes in the merged data frame. How can I eliminate dupes in the merged data frame.
So, for ID = 8, I would expect to see 362563740691 and for ID = 10, I would expect to see 362563746711.
Instead, I'm seeing 362785076491 for ID = 8.
Your df2 do have a lot of duplicate values. I don't know if you need to keep redundant data, if you don't need to you can drop df2 duplicates
df2.drop_duplicates(inplace = True)
print(df1.merge(df2, left_on = 'id', right_on = 'ticket_id'))
This immediately remove the duplicate rows in the final dataframe.
Another possibility is to remove duplicated rows after the merge.
df1 = df1.merge(df2, left_on = 'id', right_on = 'ticket_id', how = 'inner')
df1.drop_duplicates(inplace = True)
print(df1)
假设df2中的所有id / ticket_id对均已复制,如示例所示:
df_new=df1.merge(df2[~df2.duplicated()==1], left_on='id', right_on='ticket_id').drop('ticket_id', axis=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.