I want to compare two column values [a] & [b] of test dataframe with my master dataframe. If the match is found, I want to add values from column [x] & [y] from test data to master data.
If there are duplicates in master data for [a] & [b] , the resp. values in [x] & [y] need to be updated at all the rows.
If values are already present in [x] and [y] of master data, then they must be updated with those in test data.
If duplicate combination is present in test data during iteration, like combo 6545 in [a] and 345 in [b] is again found at last row, the initially updated value in master data must be replaced with latest one.
Master Data:
a1 = [['11A2',456,'c0','',''], ['16C3',523,'c1','',''], ['11A2',456,'c2','',''],['45CE',876,'c3',5,'13-03-2021'],[6545,342,'c4','',''],[8888,123,'c5',20,'21-02-2021'],['523V',654,'c6','','']]
master = pd.DataFrame(a1, columns=['a','b','c','x','y'])
a b c x y
0 11A2 456 c0
1 16C3 523 c1
2 11A2 456 c2
3 45CE 876 c3 5 13-03-2021
4 6545 342 c4
5 8888 123 c5 20 21-02-2021
6 523V 654 c6
Test Data:
a2 = [[6545,342,25,'25-05-2021'], ['45CE',876,15,'19-04-2021'], ['11A2',456,40,'07-09-2021'],[4444,321,51,'26-12-2021'],[8888,123,50,'30-03-2021'],[6545,342,50,'15-07-2021']]
test = pd.DataFrame(a2, columns=['a','b','x','y'])
a b x y
0 6545 342 25 25-05-2021
1 45CE 876 15 19-04-2021
2 11A2 456 40 07-09-2021
3 4444 321 51 26-12-2021
4 8888 123 50 30-03-2021
5 6545 342 50 15-07-2021
Final master data:
a b c x y
0 11A2 456 c0 40 07-09-2021
1 16C3 523 c1
2 11A2 456 c2 40 07-09-2021
3 45CE 876 c3 15 19-04-2021
4 6545 342 c4 50 15-07-2021
5 8888 123 c5 50 30-03-2021
6 523V 654 c6
What I tried: I have tried to map dataframes using merge, but due to presence of duplicates, getting error. Even I tried by setting [a] and [b] as index and using join, but its not working for duplicate indices.
Do we need to iterate using loop in this case?
We can drop the duplicate rows in test
with respect to columns a, b
then set the index of test
and master
to a, b
and update
the values in master
using the values from test
c = ['a', 'b']
master = master.set_index(c)
master.update(test.drop_duplicates(c, keep='last').set_index(c))
master = master.reset_index()
a b c x y
0 11A2 456 c0 40.0 07-09-2021
1 16C3 523 c1
2 11A2 456 c2 40.0 07-09-2021
3 45CE 876 c3 15.0 19-04-2021
4 6545 342 c4 50.0 15-07-2021
5 8888 123 c5 50.0 30-03-2021
6 523V 654 c6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.