How to find 2 column values of one dataframe in another dataframe and replace the initial values if match found?

Question

I want to compare two column values [a] & [b] of test dataframe with my master dataframe. If the match is found, I want to add values from column [x] & [y] from test data to master data.

If there are duplicates in master data for [a] & [b] , the resp. values in [x] & [y] need to be updated at all the rows.

If values are already present in [x] and [y] of master data, then they must be updated with those in test data.

If duplicate combination is present in test data during iteration, like combo 6545 in [a] and 345 in [b] is again found at last row, the initially updated value in master data must be replaced with latest one.

Master Data:

a1 = [['11A2',456,'c0','',''], ['16C3',523,'c1','',''], ['11A2',456,'c2','',''],['45CE',876,'c3',5,'13-03-2021'],[6545,342,'c4','',''],[8888,123,'c5',20,'21-02-2021'],['523V',654,'c6','','']]
master = pd.DataFrame(a1, columns=['a','b','c','x','y'])

    a       b   c   x   y
0   11A2    456 c0      
1   16C3    523 c1      
2   11A2    456 c2      
3   45CE    876 c3  5   13-03-2021
4   6545    342 c4      
5   8888    123 c5  20  21-02-2021
6   523V    654 c6

Test Data:

a2 = [[6545,342,25,'25-05-2021'], ['45CE',876,15,'19-04-2021'], ['11A2',456,40,'07-09-2021'],[4444,321,51,'26-12-2021'],[8888,123,50,'30-03-2021'],[6545,342,50,'15-07-2021']]
test = pd.DataFrame(a2, columns=['a','b','x','y'])

    a       b   x   y
0   6545    342 25  25-05-2021
1   45CE    876 15  19-04-2021
2   11A2    456 40  07-09-2021
3   4444    321 51  26-12-2021
4   8888    123 50  30-03-2021
5   6545    342 50  15-07-2021

Final master data:

    a       b   c   x   y
0   11A2    456 c0  40  07-09-2021
1   16C3    523 c1      
2   11A2    456 c2  40  07-09-2021
3   45CE    876 c3  15  19-04-2021
4   6545    342 c4  50  15-07-2021
5   8888    123 c5  50  30-03-2021
6   523V    654 c6

What I tried: I have tried to map dataframes using merge, but due to presence of duplicates, getting error. Even I tried by setting [a] and [b] as index and using join, but its not working for duplicate indices.

Do we need to iterate using loop in this case?

Answer 1

We can drop the duplicate rows in test with respect to columns a, b then set the index of test and master to a, b and update the values in master using the values from test

c = ['a', 'b']
master = master.set_index(c)
master.update(test.drop_duplicates(c, keep='last').set_index(c))
master = master.reset_index()

      a    b   c     x           y
0  11A2  456  c0  40.0  07-09-2021
1  16C3  523  c1                  
2  11A2  456  c2  40.0  07-09-2021
3  45CE  876  c3  15.0  19-04-2021
4  6545  342  c4  50.0  15-07-2021
5  8888  123  c5  50.0  30-03-2021
6  523V  654  c6

How to find 2 column values of one dataframe in another dataframe and replace the initial values if match found?

Question

1 answers

solution1
0 2021-10-19 16:17:28

How to find 2 column values of one dataframe in another dataframe and replace the initial values if match found?

Question

1 answers

solution1 0 2021-10-19 16:17:28

solution1
0 2021-10-19 16:17:28