How to update column value of a data frame from another data frame matching 2 columns?

Question

I have 2 dataframes, and I want to update the score of rows with the same 2 column values.

How can I do that?

df 1:

DEP ID    |  Team ID |  Group    |   Score
001       |  002     |    A      |    50
001       |  004     |    A      |    70
002       |  002     |    A      |    50
002       |  007     |    A      |    90

df 2 (a subset of one department):

DEP ID    |  Team ID |  Group    |   Result
001       |  002     |    A      |    80
001       |  003     |    A      |    60
001       |  004     |    A      |    70

OUTPUT: All columns with the same TeamID and Group update the score

DEP ID    |  Team ID |  Group    |   Score
001       |  002     |    A      |    80
001       |  004     |    A      |    70
002       |  002     |    A      |    80
002       |  007     |    A      |    90

I've tried doing pd merge left join but I'm not really getting the expected result.

Any suggestions?

Answer 1

Here's a way to do it:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')

Explanation:

modify df2 so it has Team ID, Group as its index and its only column is Result
use join to bring the new scores from df2 into a Result column in df1
use loc to update Score values for rows where Result is not null (ie, rows for which an updated Score is available)
drop the Result column.

Full test code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'DEP ID':['001','001','002','002'],
'Team ID':['002','004','002','007'],
'Group':['A','A','A','A'],
'Score':[50,70,50,90]})
df2 = pd.DataFrame({
'DEP ID':['001','001','001'],
'Team ID':['002','003','004'],
'Group':['A','A','A'],
'Result':[80,60,70]})

print(df1)
print(df2)

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')
print(df1)

Output:

   index DEP ID Team ID Group  Score
0      0    001     002     A     80
1      1    001     004     A     70
2      2    002     002     A     80
3      3    002     007     A     90

UPDATE :

If Result column in df2 is instead named Score , as asked by OP in a comment, then the code can be adjusted slightly as follows:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'], rsuffix='_NEW')
df1.loc[df1.Score_NEW.notna(), 'Score'] = df1.Score_NEW
df1 = df1.drop(columns='Score_NEW')

How to update column value of a data frame from another data frame matching 2 columns?

Question

1 answers

solution1
1 ACCPTED 2022-05-31 23:12:50

How to update column value of a data frame from another data frame matching 2 columns?

Question

1 answers

solution1 1 ACCPTED 2022-05-31 23:12:50

solution1
1 ACCPTED 2022-05-31 23:12:50