merge two pandas data frame based on a column comparison and skip common columns of right

Question

I am trying to merge two pandas data frames(DF-1 and DF-2) using a common column (datetime) (I imported both data frames from csv files). I want to add non-common columns from DF-2 into DF-1 ignoring all the common columns from DF-2.

DF-1

date       time  open   high   low    close      datetime         col1            
2018-01-01 09:15  11    14     17     20     2018-01-01 09:15:00  101
2018-01-01 09:16  12    15     18     21     2018-01-01 09:16:00  102
2018-01-01 09:17  13    16     19     22     2018-01-01 09:17:00  103

DF-2

date       time  open   high   low    close      datetime         col2            
2018-01-01 09:15 23     26     29     32     2018-01-01 09:15:00  104
2018-01-01 09:16 24     27     30     33     2018-01-01 09:16:00  105
2018-01-01 09:17 25     28     31     34     2018-01-01 09:17:00  106

merged DF(I want)

date       time  open   high   low    close   datetime          col1   col2        
2018-01-01 09:15  11    14     17     20   2018-01-01 09:15:00  101    104
2018-01-01 09:16  12    15     18     21   2018-01-01 09:16:00  102    105
2018-01-01 09:17  13    16     19     22   2018-01-01 09:17:00  103    106

Code used: merged_left = pd.merge(left=DF1,right=DF2, how='left', left_on='datetime', right_on='datetime')

What i get: Is two data framed merged with common columns named time_x, open_x, high_x, low_x, close_x, time_y, open_y, high_y, low_y, close_y, col1, col2

I want to ignore all _y columns and keep _x

Any help would be greatly appreciated.

Answer 1

You can use suffixes to make sure the second dataframe has it's dupe columns named a certain way. Then you can filter out the columns with filter

>>> df1
   a  b
0  1  2
>>> df2
   a  b  c
0  1  2  3
>>> df1.merge(df2, on=['a'], suffixes=['', '_y'])
   a  b  b_y  c
0  1  2    2  3
>>> df1.merge(df2, on=['a'], how='left', suffixes=['', '_y']).filter(regex='^(?!_y).$', axis=1)
   a  b  c
0  1  2  3

-- Edit -- I find filtering dupe columns this way useful because you can have an arbitrary # of dupes and it'll take them out. You don't have to explicitly pass the columns names you want to keep

Answer 2

您可以在merge过滤列

pd.merge(left=DF1,right=DF2[['datetime','col2']], how='left', left_on='datetime', right_on='datetime')

Answer 3

You could create a list comprehension with all the '_y' columns, then pass that into pandas.drop

drop_labels = [col for col in merged_left.columns if col.find('_y') > 0]
merged_left.drop(drop_labels,axis = 1,inplace = True)

That will leave you with all unique columns and the _x columns

merge two pandas data frame based on a column comparison and skip common columns of right

Question

3 answers

solution1
3 2018-09-18 16:22:44

solution2
2 2018-09-18 16:00:26

solution3
0 2018-09-18 16:05:03

merge two pandas data frame based on a column comparison and skip common columns of right

Question

3 answers

solution1 3 2018-09-18 16:22:44

solution2 2 2018-09-18 16:00:26

solution3 0 2018-09-18 16:05:03

solution1
3 2018-09-18 16:22:44

solution2
2 2018-09-18 16:00:26

solution3
0 2018-09-18 16:05:03