简体   繁体   中英

Merge Two different dataframe with Pandas

I am new to pandas, I need to complete the following task, is there an effective way to do it? There are 2 different dataframes, dfa and dfb: dfa

dfb

I used this to merge them together:

df = pd.merge(dfa, dfb, left_on = ['a_retry','a_cca', 'a_rssif', 'a_lqif'], right_on = ['b_retry','b_cca', 'b_rssif', 'b_lqif'])

I got the df output: df

However it is not my expectation. The merged dataframe contains all columns, it is OK, but the rows shall not exceed the smaller one (aka. dfa), that means the row 3 must be dropped, the expected one is: 在此处输入图片说明 How can I do that? Thanks.

It is expected, because duplicates per all 4 columns.

So need remove duplicates rows by drop_duplicates :

dfa = dfa.drop_duplicates(subset=['a_retry','a_cca', 'a_rssif', 'a_lqif'])
dfb = dfb.drop_duplicates(subset=['b_retry','b_cca', 'b_rssif', 'b_lqif'])

But if need match duplicates rows, is it possible with new column by cumcount , which is used for merge :

dfa['new'] = dfa.groupby(['a_retry','a_cca', 'a_rssif', 'a_lqif']).cumcount()
dfb['new'] = dfb.groupby(['b_retry','b_cca', 'b_rssif', 'b_lqif']).cumcount()

df = (pd.merge(dfa, 
               dfb, 
               left_on = ['a_retry','a_cca', 'a_rssif', 'a_lqif', 'new'], 
               right_on = ['b_retry','b_cca', 'b_rssif','b_lqif', 'new']).drop('new', axis=1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM