简体   繁体   中英

Pandas: replace values in one dataframe with values from another dataframe based on two columns

I have two dataframes:

d1 = {'id_': ['a','b','c','d'],
     'year':['2018','2019','2017','2019']}
d2 = {'id_': ['a','c','e'],
     'year':['2015',NaN,'2012']}
test1 = pd.DataFrame(d1)
test2 = pd.DataFrame(d2)


    id_ year
0   a   2018
1   b   2019
2   c   2017
3   d   2019

    id_ year
0   a   2015
1   c   None
2   e   2012

I need to replace year values in test1 with year values from test2 only when id_ match. If the value is NaN, I'd like to keep the old value.

So the result looks like:

        id_ year
    0   a   2015
    1   b   2019
    2   c   2017
    3   d   2019

All answers I came across were based on index or mapping old values to new values using dictionaries. I will appreciate your help.

Using update

test1=test1.set_index('id_')
test1.update(test2.set_index('id_'))
test1.reset_index(inplace=True)
test1
Out[582]: 
  id_  year
0   a  2015
1   b  2019
2   c  2017
3   d  2019

Let's use concat and drop_duplicates here.

test3 = test2[test2['id_'].isin(test1['id_'])].dropna()
pd.concat([test1, test3]).drop_duplicates('id_', keep='last')   

  id_  year
1   b  2019
2   c  2017
3   d  2019
0   a  2015

Here's a merge -based alternative.

test3 = test1.merge(test2, on='id_', how='left')
test3['year'] = test3.pop('year_y').fillna(test3.pop('year_x'))
test3

  id_  year
0   a  2015
1   b  2019
2   c  2017
3   d  2019

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM