How to update values in multiple columns of a dataframe based on value of specific columns of another dataframe

Question

I have 2 padas dataframe

df1 = pd.DataFrame({'id': [1001,1002,1004],
                   'col1': ["a","b","d"],
                   'col2': [1,2,6]})

df2 = pd.DataFrame({'id': [1001,1002,1003,1004,1005,1006,1007],
                    'a': [10,10,10,10,10,10,10],
                    'b': [2,2,2,2,2,2,2],
                    'c': [1,2,3,4,5,6,7],
                    'd': [5,5,5,5,5,4,5],
                    'e': [0,3,4,6,7,5,5]})

df1 and df2 has common id present.

whenever value appears in df1.col1 (for example "a") subtract df1.col2 value from corresponding df2 id and columnname =df1.col1 value ("a") .

Above statement may be confusing but I will try to explain with an example:

df1 id 1001 has col1=a and col2=1

what I want to do is I want to subtract 1 from column a of df2 for id 1001 =10-1=9

in another example

df1 id =1004 has col1 value=d and col2=6 so subtract 6 from column d of df2 corresponding to id 1004 =5-6=-1 Final outcome will look like this

    a  b  c  d  e     id
0   9  2  1  5  0   1001
1  10  0  2  5  3   1002
2  10  2  3  5  4   1003
3  10  2  4 -1  6   1004
4  10  2  5  5  7   1005
5  10  2  6  4  5   1006
6  10  2  7  5  5   1007

How should I got about solving this in Pandas in efficient manner since I have to repeat this exercise for quite a number of times on big datasets.

Thanks in advance

Answer 1

use lambda as follows

    df1.apply(lambda row : updateDF2(row), axis=1)

full sample code is

import pandas as pd
df1 = pd.DataFrame({'id': [1001,1002,1004],
                   'col1': ["a","b","d"],
                   'col2': [1,2,6]})

df2 = pd.DataFrame({'id': [1001,1002,1003,1004,1005,1006,1007],
                    'a': [10,10,10,10,10,10,10],
                    'b': [2,2,2,2,2,2,2],
                    'c': [1,2,3,4,5,6,7],
                    'd': [5,5,5,5,5,4,5],
                    'e': [0,3,4,6,7,5,5]})
def updateDF2(row):
    df2.loc[df2["id"] == row["id"], row["col1"]] -= row["col2"]

#df1.apply(lambda row : updateDF2(row), axis=1)
df1.apply(updateDF2, axis=1)

print(df2)

output is

    a  b  c  d  e    id
0   9  2  1  5  0  1001
1  10  0  2  5  3  1002
2  10  2  3  5  4  1003
3  10  2  4 -1  6  1004
4  10  2  5  5  7  1005
5  10  2  6  4  5  1006
6  10  2  7  5  5  1007
[Finished in 0.9s]

How to update values in multiple columns of a dataframe based on value of specific columns of another dataframe

Question

1 answers

solution1
1 2018-12-13 08:25:59

How to update values in multiple columns of a dataframe based on value of specific columns of another dataframe

Question

1 answers

solution1 1 2018-12-13 08:25:59

solution1
1 2018-12-13 08:25:59