简体   繁体   中英

Pandas: How to merge two data frames and fill NaN values using values from the second data frame

I have a pandas dataframe (df1) that looks like this:

No     car          pl.       Value      Expected      
1      Toyota       HK        0.1        0.12      
1      Toyota       NY        0.2        NaN     
2      Saab         LOS       0.3        NaN      
2      Saab         UK        0.4        0.6       
2      Saab         HK        0.5        0.51     
3      Audi         NYU       0.6        NaN      
3      Audi         LOS       0.7        NaN      
4      VW           UK        0.8        NaN   
5      Audi         HK        0.9        NaN    

And I have another dataframe (df2) that looks like this:

No        pl.             Expected              
2         LOS              0.35      
3         NYU              0.62   
3         LOS              0.76    
5         HK               0.91     

I would like my final dataframe to look like this:

No     car          pl.       Value      Expected      
1      Toyota       HK        0.1        0.12      
1      Toyota       NY        0.2        NaN     
2      Saab         LOS       0.3        0.35      
2      Saab         UK        0.4        0.6       
2      Saab         HK        0.5        0.51     
3      Audi         NYU       0.6        0.62      
3      Audi         LOS       0.7        0.76      
4      VW           UK        0.8        NaN   
5      Audi         HK        0.9        0.91    

I tried this:

df = df1.fillna(df1.merge(df2, on=['No','pl.']))

But df1 remains unchanged in the output

The questions that I have seen here have been of dataframes with the same shape. Is there a way to do this when the shapes are different?

Thanks in advance!

Since we have two key columns where we want to match on and update our df1 dataframe, we can use set_index with fillna , since fillna aligns in the indices:

keys = ['No', 'pl.']
df1 = df1.set_index(keys).fillna(df2.set_index(keys)).reset_index()

   No  pl.     car  Value  Expected
0   1   HK  Toyota    0.1      0.12
1   1   NY  Toyota    0.2       NaN
2   2  LOS    Saab    0.3      0.35
3   2   UK    Saab    0.4      0.60
4   2   HK    Saab    0.5      0.51
5   3  NYU    Audi    0.6      0.62
6   3  LOS    Audi    0.7      0.76
7   4   UK      VW    0.8       NaN
8   5   HK    Audi    0.9      0.91

Or we can use the dedicated method Series.update for this:

df1 = df1.set_index(keys)
df1['Expected'].update(df2.set_index(keys)['Expected'])
df1 = df1.reset_index()

   No  pl.     car  Value  Expected
0   1   HK  Toyota    0.1      0.12
1   1   NY  Toyota    0.2       NaN
2   2  LOS    Saab    0.3      0.35
3   2   UK    Saab    0.4      0.60
4   2   HK    Saab    0.5      0.51
5   3  NYU    Audi    0.6      0.62
6   3  LOS    Audi    0.7      0.76
7   4   UK      VW    0.8       NaN
8   5   HK    Audi    0.9      0.91

Use left join with suffixes parameter and then replace missing values by Series.fillna with DataFrame.pop for use and drop column Expected_ :

df = df1.merge(df2, on=['No','pl.'], how='left', suffixes=('_',''))
df['Expected'] = df.pop('Expected_').fillna(df['Expected'])
print (df)
   No     car  pl.  Value  Expected
0   1  Toyota   HK    0.1      0.12
1   1  Toyota   NY    0.2       NaN
2   2    Saab  LOS    0.3      0.35
3   2    Saab   UK    0.4      0.60
4   2    Saab   HK    0.5      0.51
5   3    Audi  NYU    0.6      0.62
6   3    Audi  LOS    0.7      0.76
7   4      VW   UK    0.8       NaN
8   5    Audi   HK    0.9      0.91

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM