简体   繁体   中英

Row wise merge dataframes with overlapping data in python/ pandas

I have two large data frames which I need to merge row-wise. These two dataframes may have a few overlapping rows.

Example:

data1
    key name               patent
    11  Alphagrep          112344
    12  Citrix             112345

data2
    Sr  name               patents
    11  Alphagrep          112344
    13  Taj                112322

I want to merge these dataframes row wise on [Key,Sr] & [patent, patents]

Which is: If row-data1[key]==row-data2[Sr] & row-data1[patent]==row-data2[patents], Merge or else append.

Result should be:

data1 + data2
    key name               patent
    11  Alphagrep          112344
    12  Citrix             112345
    13  Taj                112322

How should one do this in pandas?

You can do something like this:

print data1, '\n', '\n', data2, '\n'
data2.columns = data1.columns.tolist()
data3 = pd.concat([data1, data2]).drop_duplicates()
print data3

Output:

   key       name  patent
0   11  Alphagrep  112344
1   12     Citrix  112345 

   Sr       name  patents
0  11  Alphagrep   112344
1  13        Taj   112322 

   key       name  patent
0   11  Alphagrep  112344
1   12     Citrix  112345
1   13        Taj  112322

set_index and combine_first

c1 = ['key', 'patent']
c2 = ['Sr', 'patents']

data1.set_index(c1) \
    .combine_first(
        data2.set_index(c2).rename_axis(c1)
    ).reset_index()

   key  patent       name
0   11  112344  Alphagrep
1   12  112345     Citrix
2   13  112322        Taj

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM