Pandas: How to merge two data frames and fill NaN values using values from the second data frame

Question

I have a pandas dataframe (df1) that looks like this:

No     car          pl.       Value      Expected      
1      Toyota       HK        0.1        0.12      
1      Toyota       NY        0.2        NaN     
2      Saab         LOS       0.3        NaN      
2      Saab         UK        0.4        0.6       
2      Saab         HK        0.5        0.51     
3      Audi         NYU       0.6        NaN      
3      Audi         LOS       0.7        NaN      
4      VW           UK        0.8        NaN   
5      Audi         HK        0.9        NaN

And I have another dataframe (df2) that looks like this:

No        pl.             Expected              
2         LOS              0.35      
3         NYU              0.62   
3         LOS              0.76    
5         HK               0.91

I would like my final dataframe to look like this:

No     car          pl.       Value      Expected      
1      Toyota       HK        0.1        0.12      
1      Toyota       NY        0.2        NaN     
2      Saab         LOS       0.3        0.35      
2      Saab         UK        0.4        0.6       
2      Saab         HK        0.5        0.51     
3      Audi         NYU       0.6        0.62      
3      Audi         LOS       0.7        0.76      
4      VW           UK        0.8        NaN   
5      Audi         HK        0.9        0.91

I tried this:

df = df1.fillna(df1.merge(df2, on=['No','pl.']))

But df1 remains unchanged in the output

The questions that I have seen here have been of dataframes with the same shape. Is there a way to do this when the shapes are different?

Thanks in advance!

Answer 1

Since we have two key columns where we want to match on and update our df1 dataframe, we can use set_index with fillna , since fillna aligns in the indices:

keys = ['No', 'pl.']
df1 = df1.set_index(keys).fillna(df2.set_index(keys)).reset_index()

   No  pl.     car  Value  Expected
0   1   HK  Toyota    0.1      0.12
1   1   NY  Toyota    0.2       NaN
2   2  LOS    Saab    0.3      0.35
3   2   UK    Saab    0.4      0.60
4   2   HK    Saab    0.5      0.51
5   3  NYU    Audi    0.6      0.62
6   3  LOS    Audi    0.7      0.76
7   4   UK      VW    0.8       NaN
8   5   HK    Audi    0.9      0.91

Or we can use the dedicated method Series.update for this:

df1 = df1.set_index(keys)
df1['Expected'].update(df2.set_index(keys)['Expected'])
df1 = df1.reset_index()

   No  pl.     car  Value  Expected
0   1   HK  Toyota    0.1      0.12
1   1   NY  Toyota    0.2       NaN
2   2  LOS    Saab    0.3      0.35
3   2   UK    Saab    0.4      0.60
4   2   HK    Saab    0.5      0.51
5   3  NYU    Audi    0.6      0.62
6   3  LOS    Audi    0.7      0.76
7   4   UK      VW    0.8       NaN
8   5   HK    Audi    0.9      0.91

Answer 2

Use left join with suffixes parameter and then replace missing values by Series.fillna with DataFrame.pop for use and drop column Expected_ :

df = df1.merge(df2, on=['No','pl.'], how='left', suffixes=('_',''))
df['Expected'] = df.pop('Expected_').fillna(df['Expected'])
print (df)
   No     car  pl.  Value  Expected
0   1  Toyota   HK    0.1      0.12
1   1  Toyota   NY    0.2       NaN
2   2    Saab  LOS    0.3      0.35
3   2    Saab   UK    0.4      0.60
4   2    Saab   HK    0.5      0.51
5   3    Audi  NYU    0.6      0.62
6   3    Audi  LOS    0.7      0.76
7   4      VW   UK    0.8       NaN
8   5    Audi   HK    0.9      0.91

Pandas: How to merge two data frames and fill NaN values using values from the second data frame

Question

2 answers

solution1
2 2020-04-19 10:35:25

solution2
1 ACCPTED 2020-04-19 10:30:23

Pandas: How to merge two data frames and fill NaN values using values from the second data frame

Question

2 answers

solution1 2 2020-04-19 10:35:25

solution2 1 ACCPTED 2020-04-19 10:30:23

solution1
2 2020-04-19 10:35:25

solution2
1 ACCPTED 2020-04-19 10:30:23