Merge Two different dataframe with Pandas

Question

I am new to pandas, I need to complete the following task, is there an effective way to do it? There are 2 different dataframes, dfa and dfb:

I used this to merge them together:

df = pd.merge(dfa, dfb, left_on = ['a_retry','a_cca', 'a_rssif', 'a_lqif'], right_on = ['b_retry','b_cca', 'b_rssif', 'b_lqif'])

I got the df output:

However it is not my expectation. The merged dataframe contains all columns, it is OK, but the rows shall not exceed the smaller one (aka. dfa), that means the row 3 must be dropped, the expected one is: How can I do that? Thanks.

Answer 1

It is expected, because duplicates per all 4 columns.

So need remove duplicates rows by drop_duplicates :

dfa = dfa.drop_duplicates(subset=['a_retry','a_cca', 'a_rssif', 'a_lqif'])
dfb = dfb.drop_duplicates(subset=['b_retry','b_cca', 'b_rssif', 'b_lqif'])

But if need match duplicates rows, is it possible with new column by cumcount , which is used for merge :

dfa['new'] = dfa.groupby(['a_retry','a_cca', 'a_rssif', 'a_lqif']).cumcount()
dfb['new'] = dfb.groupby(['b_retry','b_cca', 'b_rssif', 'b_lqif']).cumcount()

df = (pd.merge(dfa, 
               dfb, 
               left_on = ['a_retry','a_cca', 'a_rssif', 'a_lqif', 'new'], 
               right_on = ['b_retry','b_cca', 'b_rssif','b_lqif', 'new']).drop('new', axis=1))

Merge Two different dataframe with Pandas

Question

1 answers

solution1
0 2018-05-23 14:57:44

Merge Two different dataframe with Pandas

Question

1 answers

solution1 0 2018-05-23 14:57:44

solution1
0 2018-05-23 14:57:44