简体   繁体   中英

How to replace one pandas dataframe column values based on some other dataframe?

I have two dataframes. df1 and df2 . This is the content of df1

  col1  col2  col3
0    1    12   100
1    2    34   200
2    3    56   300
3    4    78   400

This is the content of df2

  col1  col2  col3
0    2  1984   500
1    3  4891   600

I want to have this final data frame:

  col1  col2  col3
0    1    12   100
1    2  1984   200
2    3  4891   300
3    4    78   400

Note that col1 is the primary key in df1 and df2 . I tried to do it via mapping values, but I could not make it work.

Here is an MCVE for checking those data frames easily:

import pandas as pd
d = {'col1': ['1', '2','3','4'], 'col2': [12, 34,56,78],'col3':[100,200,300,400]}
df1 = pd.DataFrame(data=d)
d = {'col1': ['2','3'], 'col2': [1984,4891],'col3':[500,600]}
df2 = pd.DataFrame(data=d)
print(df1)
print(df2)
d = {'col1': ['1', '2','3','4'], 'col2': [12, 1984,4891,78],'col3':[100,200,300,400]}
df_final = pd.DataFrame(data=d)
print(df_final)

You can map and fillna :

df1['col2'] = (df1['col1']
               .map(df2.set_index('col1')['col2'])
               .fillna(df1['col2'], downcast='infer')
              )

output:

  col1  col2  col3
0    1    12   100
1    2  1984   200
2    3  4891   300
3    4    78   400

If col1 is unique, combine_first is an option, too:

>>> (df2.drop("col3", axis=1)
        .set_index("col1")
        .combine_first(df1.set_index("col1"))
        .reset_index()
    )
  col1  col2  col3
0    1    12   100
1    2  1984   200
2    3  4891   300
3    4    78   400

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM