简体   繁体   中英

Merging two pandas dataframes on multiple columns

I have two dataframes:

>>> df1
[Output]: col1   col2   col3   col4
           a     abc     10    str1
           b     abc     20    str2
           c     def     20    str2
           d     abc     30    str2

>>> df2
[Output]: col1   col2   col3   col5   col6
           d     abc     30    str6    47
           b     abc     20    str5    66
           c     def     20    str7    53
           a     abc     10    str5    21

Below is what I want to generate:

>>> df_merged
[Output]: col1   col2   col5
           a     abc    str5
           b     abc    str5 
           c     def    str7
           d     abc    str6

I don't want to generate more than 4 rows and that is usually what happens when I try to merge the dataframes. Thanks for the tips!

df_merged = pd.DataFrame()
df_merged['col1'] = df1['col1'][0:3]
df_merged['col2'] = df1['col2'][0:3]
df_merged['col5'] = df2['col5'][0:3]

Does that help with what you're looking for?

Use .merge by subselecting the correct columns and using col1 & col2 as key columns:

df1[['col1', 'col2']].merge(df2[['col1', 'col2', 'col5']], on=['col1', 'col2'])

  col1 col2  col5
0    a  abc  str5
1    b  abc  str5
2    c  def  str7
3    d  abc  str6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM