I am new to pandas, I need to complete the following task, is there an effective way to do it? There are 2 different dataframes, dfa and dfb:
I used this to merge them together:
df = pd.merge(dfa, dfb, left_on = ['a_retry','a_cca', 'a_rssif', 'a_lqif'], right_on = ['b_retry','b_cca', 'b_rssif', 'b_lqif'])
However it is not my expectation. The merged dataframe contains all columns, it is OK, but the rows shall not exceed the smaller one (aka. dfa), that means the row 3 must be dropped, the expected one is: How can I do that? Thanks.
It is expected, because duplicates per all 4 columns.
So need remove duplicates rows by drop_duplicates
:
dfa = dfa.drop_duplicates(subset=['a_retry','a_cca', 'a_rssif', 'a_lqif'])
dfb = dfb.drop_duplicates(subset=['b_retry','b_cca', 'b_rssif', 'b_lqif'])
But if need match duplicates rows, is it possible with new column by cumcount
, which is used for merge
:
dfa['new'] = dfa.groupby(['a_retry','a_cca', 'a_rssif', 'a_lqif']).cumcount()
dfb['new'] = dfb.groupby(['b_retry','b_cca', 'b_rssif', 'b_lqif']).cumcount()
df = (pd.merge(dfa,
dfb,
left_on = ['a_retry','a_cca', 'a_rssif', 'a_lqif', 'new'],
right_on = ['b_retry','b_cca', 'b_rssif','b_lqif', 'new']).drop('new', axis=1))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.