简体   繁体   中英

How to Replace Blank Indexes in One Dataframe with Indexes From Another Dataframe

I have two dataframes, df1 and df2 .

df1 is scraped data:

  Name         ID   Symbol
0  AAA   23135106         
1  Bbb  G06242104  String2
2  Ccc  30303M102  String3
3  DDD   2079K305         
4        2079K107  

And df2 is reference data:

  Name         ID   Symbol
0  Aaa   23135106  String1
1  Bbb  G06242104  String2
2  Ccc  98980L101  String3
3  Ddd   2079K305  String4
4  Eee   2079K107  String5
5  Fff    287Y109  String6
6  Ggg     380105  String7
7  Hhh  G00349103  String8

By using ID as the key, I want to:

  1. populate the empty Symbols and Names in df1 with those in df2 , and
  2. replace the malformatted (eg, AAA vs Aaa) Names in df1 with those in df2 ,

so that the end result looks like:

  Name         ID   Symbol
0  Aaa   23135106  String1       
1  Bbb  G06242104  String2
2  Ccc  30303M102  String3
3  Ddd   2079K305  String4       
4  Eee   2079K107  String5

fillna and map is what you need:

df1['Symbol'] = df1.Symbol.fillna(df1.ID.map(df2.set_index('ID').Symbol)) 

Output:

  Name         ID   Symbol
0  AAA   23135106  String1
1  Bbb  G06242104  String2
2  Ccc  30303M102  String3
3  DDD   2079K305  String4
4  EEE   2079K107  String5

I think you only need DataFrame.merge + DataFrame.fillna :

df1[['Name','ID']].merge(df2[['ID','Symbol']],on='ID',how = 'left').fillna(df1)

  Name         ID   Symbol
0  AAA   23135106  String1
1  Bbb  G06242104  String2
2  Ccc  30303M102  String3
3  DDD   2079K305  String4
4  EEE   2079K107  String5

or

( df1[['ID']].merge(df2[['Name','ID','Symbol']],on='ID',how = 'left')
             .fillna(df1)
             .reindex(columns = df1.columns) )

  Name         ID   Symbol
0  Aaa   23135106  String1       
1  Bbb  G06242104  String2
2  Ccc  30303M102  String3
3  Ddd   2079K305  String4       
4  Eee   2079K107  String5

If you need update both Name and Symbol , you need update and slicing assignment

df1_1 = df1.set_index('ID')
df1_1.update(df2.set_index('ID'))
df1.loc[df1.Symbol == '', ['Name', 'Symbol']] = df1_1.reset_index()

Out[1238]:
  Name         ID   Symbol
0  Aaa   23135106  String1
1  Bbb  G06242104  String2
2  Ccc  30303M102  String3
3  Ddd   2079K305  String4
4  Eee   2079K107  String5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM