Pandas replace columns by merging another dataframe

Question

I have a dataframe df1 looks like this:

     id  A   B
0    1   10  5
1    1   11  6
2    2   10  7
3    2   11  8

And another dataframe df2 :

     id  A
0    1   3
1    2   4

Now I want to replace A column in df1 with the value of A in df2 based on id , so the result should look like this:

     id  A   B
0    1   3   5
1    1   3   6
2    2   4   7
3    2   4   8

There's a way that I can drop column A in df1 first and merge df2 to df1 on id like df1 = df1.drop(['A'], axis=1).merge(df2, how='left', on='id') , but if there're like 10 columns in df2 , it will be pretty hard. Is there a more elegant way to do so?

Answer 1

here is one way to do it, by making use of pd.update. However, it requires to set the index on the id, so it can match the two df

df.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
df.update(df2)
df['A'] = df['A'].astype(int)  # value by default was of type float
df.reset_index()

    id  A   B
0   1   3   5
1   1   3   6
2   2   4   7
3   2   4   8

Answer 2

instead of dropping and merging, set the column based on a list built by iterating through df2

tmp.csv

id,A,B
1,3,5
1,3,6
2,4,7
2,4,8

tmp2.csv

id,A
1,3
2,4

code

import pandas as pd

df1 = pd.read_csv('tmp.csv')
df2 = pd.read_csv('tmp2.csv')

print('df1\n', df1)
print('df2\n', df2)

df1["A"] = [df2.loc[df2['id'] == row['id']].iloc[0]['A'] for i,row in df1.iterrows()]
print('result\n', df1)

output

df1
    id  A  B
0   1  3  5
1   1  3  6
2   2  4  7
3   2  4  8
df2
    id  A
0   1  3
1   2  4
result
    id  A  B
0   1  3  5
1   1  3  6
2   2  4  7
3   2  4  8

breakdown of whats happening

we set the "A" col of df1 by using list comprehension

df1["A"] = [ ... ]

the content of that list is based on iterating through df1 (to get the id we need to correlate to df2's "A" col)

... for i,row in df1.iterrows() ]

note that row['id'] will refer to the id in df1 we want to get a value for from df2

so now we get that value from df2

... df2.loc[df2['id'] == row['id']].iloc[0]['A'] ...

the reason we iloc[0] is because doing df2.loc[df2['id']==row['id]] will return a data frame. realistically, this data frame will only ever have 1 row in it

all together that gives us the 1 liner of list comprehension for setting some column based on how a value in that row correlates to another value in a separate df

df1["A"] = [df2.loc[df2['id'] == row['id']].iloc[0]['A'] for i,row in df1.iterrows()]

Answer 3

Merge just the id column from df to df2 , and then combine_first it to the original DataFrame:

df = df[['id']].merge(df2).combine_first(df)
print(df)

Output:

   A  B  id
0  3  5   1
1  3  6   1
2  4  7   2
3  4  8   2

Pandas replace columns by merging another dataframe

Question

2 answers

solution1
3 ACCPTED 2022-07-16 00:03:39

solution2
0 2022-07-15 23:28:30

breakdown of whats happening

solution3
0 2022-07-16 00:47:04

Pandas replace columns by merging another dataframe

Question

2 answers

solution1 3 ACCPTED 2022-07-16 00:03:39

solution2 0 2022-07-15 23:28:30

breakdown of whats happening

solution3 0 2022-07-16 00:47:04

solution1
3 ACCPTED 2022-07-16 00:03:39

solution2
0 2022-07-15 23:28:30

solution3
0 2022-07-16 00:47:04