简体   繁体   中英

How to find 2 column values of one dataframe in another dataframe and replace the initial values if match found?

I want to compare two column values [a] & [b] of test dataframe with my master dataframe. If the match is found, I want to add values from column [x] & [y] from test data to master data.

If there are duplicates in master data for [a] & [b] , the resp. values in [x] & [y] need to be updated at all the rows.

If values are already present in [x] and [y] of master data, then they must be updated with those in test data.

If duplicate combination is present in test data during iteration, like combo 6545 in [a] and 345 in [b] is again found at last row, the initially updated value in master data must be replaced with latest one.

Master Data:

a1 = [['11A2',456,'c0','',''], ['16C3',523,'c1','',''], ['11A2',456,'c2','',''],['45CE',876,'c3',5,'13-03-2021'],[6545,342,'c4','',''],[8888,123,'c5',20,'21-02-2021'],['523V',654,'c6','','']]
master = pd.DataFrame(a1, columns=['a','b','c','x','y'])
    a       b   c   x   y
0   11A2    456 c0      
1   16C3    523 c1      
2   11A2    456 c2      
3   45CE    876 c3  5   13-03-2021
4   6545    342 c4      
5   8888    123 c5  20  21-02-2021
6   523V    654 c6      

Test Data:

a2 = [[6545,342,25,'25-05-2021'], ['45CE',876,15,'19-04-2021'], ['11A2',456,40,'07-09-2021'],[4444,321,51,'26-12-2021'],[8888,123,50,'30-03-2021'],[6545,342,50,'15-07-2021']]
test = pd.DataFrame(a2, columns=['a','b','x','y'])
    a       b   x   y
0   6545    342 25  25-05-2021
1   45CE    876 15  19-04-2021
2   11A2    456 40  07-09-2021
3   4444    321 51  26-12-2021
4   8888    123 50  30-03-2021
5   6545    342 50  15-07-2021

Final master data:

    a       b   c   x   y
0   11A2    456 c0  40  07-09-2021
1   16C3    523 c1      
2   11A2    456 c2  40  07-09-2021
3   45CE    876 c3  15  19-04-2021
4   6545    342 c4  50  15-07-2021
5   8888    123 c5  50  30-03-2021
6   523V    654 c6      

What I tried: I have tried to map dataframes using merge, but due to presence of duplicates, getting error. Even I tried by setting [a] and [b] as index and using join, but its not working for duplicate indices.

Do we need to iterate using loop in this case?

We can drop the duplicate rows in test with respect to columns a, b then set the index of test and master to a, b and update the values in master using the values from test

c = ['a', 'b']
master = master.set_index(c)
master.update(test.drop_duplicates(c, keep='last').set_index(c))
master = master.reset_index()

      a    b   c     x           y
0  11A2  456  c0  40.0  07-09-2021
1  16C3  523  c1                  
2  11A2  456  c2  40.0  07-09-2021
3  45CE  876  c3  15.0  19-04-2021
4  6545  342  c4  50.0  15-07-2021
5  8888  123  c5  50.0  30-03-2021
6  523V  654  c6                  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM