简体   繁体   中英

merge two pandas data frame based on a column comparison and skip common columns of right

I am trying to merge two pandas data frames(DF-1 and DF-2) using a common column (datetime) (I imported both data frames from csv files). I want to add non-common columns from DF-2 into DF-1 ignoring all the common columns from DF-2.

DF-1

date       time  open   high   low    close      datetime         col1            
2018-01-01 09:15  11    14     17     20     2018-01-01 09:15:00  101
2018-01-01 09:16  12    15     18     21     2018-01-01 09:16:00  102
2018-01-01 09:17  13    16     19     22     2018-01-01 09:17:00  103

DF-2

date       time  open   high   low    close      datetime         col2            
2018-01-01 09:15 23     26     29     32     2018-01-01 09:15:00  104
2018-01-01 09:16 24     27     30     33     2018-01-01 09:16:00  105
2018-01-01 09:17 25     28     31     34     2018-01-01 09:17:00  106

merged DF(I want)

date       time  open   high   low    close   datetime          col1   col2        
2018-01-01 09:15  11    14     17     20   2018-01-01 09:15:00  101    104
2018-01-01 09:16  12    15     18     21   2018-01-01 09:16:00  102    105
2018-01-01 09:17  13    16     19     22   2018-01-01 09:17:00  103    106

Code used: merged_left = pd.merge(left=DF1,right=DF2, how='left', left_on='datetime', right_on='datetime')

What i get: Is two data framed merged with common columns named time_x, open_x, high_x, low_x, close_x, time_y, open_y, high_y, low_y, close_y, col1, col2

I want to ignore all _y columns and keep _x

Any help would be greatly appreciated.

You can use suffixes to make sure the second dataframe has it's dupe columns named a certain way. Then you can filter out the columns with filter

>>> df1
   a  b
0  1  2
>>> df2
   a  b  c
0  1  2  3
>>> df1.merge(df2, on=['a'], suffixes=['', '_y'])
   a  b  b_y  c
0  1  2    2  3
>>> df1.merge(df2, on=['a'], how='left', suffixes=['', '_y']).filter(regex='^(?!_y).$', axis=1)
   a  b  c
0  1  2  3

-- Edit -- I find filtering dupe columns this way useful because you can have an arbitrary # of dupes and it'll take them out. You don't have to explicitly pass the columns names you want to keep

您可以在merge过滤列

pd.merge(left=DF1,right=DF2[['datetime','col2']], how='left', left_on='datetime', right_on='datetime')

You could create a list comprehension with all the '_y' columns, then pass that into pandas.drop

drop_labels = [col for col in merged_left.columns if col.find('_y') > 0]
merged_left.drop(drop_labels,axis = 1,inplace = True)

That will leave you with all unique columns and the _x columns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM