Python pandas dataframe fillna with column in different dataframe

Question

I want to fill Nan values in 'gvkey' column with another column values in different dataframe.

df
     wrds_id    isin_code   gvkey   gvkey_new
 0  1004    US0003611052    1004.0  1004.0
 1  1005    US1948302047    Nan     1000.0
 2  1006    US1948302047    Nan     1004.0
 3  1007    US0309541011    Nan     1004.0
 4  1007    US0003611052    1004.0  1004.0
 5  1008    IL0006046119    Nan     1004.0
 6  1008    US0003611052    1004.0  1004.0
 7  1009    US4448591028    Nan     1004.0
 8  1004    US4448591028    Nan     1004.0
 9  1004    US4448591028    Nan     1004.0
 10 1013    US0008861017    1013.0  1013.0
 11 1013    BE0003755692    Nan     1013.0
 12 1013    BE0003755692    Nan     1013.0

Using this frame, based on isin_code values, replace Nan value with gvkey_ciq_new values in first dataframe

df1
    isin_code   gvkey_ciq_new
0   US0309541011    1485.0
1   IL0006046119    2018.0
3   US1948302047    3176.0
4   US2376881064    3760.0
5   BE0003755692    5150.0
7   US4448591028    5776.0
9   GB0004544929    5898.0

Final dataframe I want to make,

finaldf
     wrds_id    isin_imp    gvkey   gvkey_new
 0  1004    US0003611052    1004.0  1004.0
 1  1005    US1948302047    3176.0  1004.0
 2  1006    US1948302047    3176.0  1004.0
 3  1007    US0309541011    1485.0  1004.0
 4  1007    US0003611052    1004.0  1004.0
 5  1008    IL0006046119    2018.0  1004.0
 6  1008    US0003611052    1004.0  1004.0
 7  1009    US4448591028    5776.0  1004.0
 8  1004    US4448591028    5776.0  1004.0
 9  1004    US4448591028    5776.0  1004.0
 10 1013    US0008861017    1013.0  1013.0
 11 1013    BE0003755692    5150.0  1013.0
 12 1013    BE0003755692    5150.0  1013.0

How can I use map function to make the dataframe?

Answer 1

Don't use any loops. Merge the dataframes and use numpy.where with pandas.inull

Set things up

from io import StringIO

import numpy
import pandas

d1 = StringIO("""\
     wrds_id    isin_code   gvkey   gvkey_new
 0  1004    US0003611052    1004.0  1004.0
 1  1005    US1948302047    Nan     1000.0
 2  1006    US1948302047    Nan     1004.0
 3  1007    US0309541011    Nan     1004.0
 4  1007    US0003611052    1004.0  1004.0
 5  1008    IL0006046119    Nan     1004.0
 6  1008    US0003611052    1004.0  1004.0
 7  1009    US4448591028    Nan     1004.0
 8  1004    US4448591028    Nan     1004.0
 9  1004    US4448591028    Nan     1004.0
 10 1013    US0008861017    1013.0  1013.0
 11 1013    BE0003755692    Nan     1013.0
 12 1013    BE0003755692    Nan     1013.0
 """)

d2 = StringIO("""\
    isin_code   gvkey_ciq_new
0   US0309541011    1485.0
1   IL0006046119    2018.0
3   US1948302047    3176.0
4   US2376881064    3760.0
5   BE0003755692    5150.0
7   US4448591028    5776.0
9   GB0004544929    5898.0
""")
df1 = pandas.read_table(d1, sep='\s+', na_values=['Nan'])
df2 = pandas.read_table(d2, sep='\s+', na_values=['Nan'])

Merge and compute the final column

result = (
    df1.merge(df2, on='isin_code', how='left')
       .assign(gvkey=lambda df: numpy.where(
           pandas.isnull(df['gvkey']),  # condition
           df['gvkey_ciq_new'],         # values if true
           df['gvkey']                  # values if false
       ))[df1.columns]                  # only keep original columns
)

print(result)

And that gives me:

    wrds_id     isin_code   gvkey  gvkey_new
0      1004  US0003611052  1004.0     1004.0
1      1005  US1948302047  3176.0     1000.0
2      1006  US1948302047  3176.0     1004.0
3      1007  US0309541011  1485.0     1004.0
4      1007  US0003611052  1004.0     1004.0
5      1008  IL0006046119  2018.0     1004.0
6      1008  US0003611052  1004.0     1004.0
7      1009  US4448591028  5776.0     1004.0
8      1004  US4448591028  5776.0     1004.0
9      1004  US4448591028  5776.0     1004.0
10     1013  US0008861017  1013.0     1013.0
11     1013  BE0003755692  5150.0     1013.0
12     1013  BE0003755692  5150.0     1013.0

Answer 2

First, make a temporary df without NaNs. Then you can use boolean indexing:

df_tmp = df[df.gvkey != pd.np.nan]
for code, gv in zip(df_tmp["isin_code"], df_tmp["gvkey"]):
    df1.at[df1.isin_code == code,"gvkey"] = gv

Maybe not the most elegant solution, but it should work.

EDIT : Or you add in the loop something like

if gv != pd.np.nan:

Then you don't need the df_tmp

Python pandas dataframe fillna with column in different dataframe

Question

2 answers

solution1
2 ACCPTED 2017-10-01 21:01:02

Set things up

Merge and compute the final column

And that gives me:

solution2
0 2017-09-27 23:21:46

Python pandas dataframe fillna with column in different dataframe

Question

2 answers

solution1 2 ACCPTED 2017-10-01 21:01:02

Set things up

Merge and compute the final column

And that gives me:

solution2 0 2017-09-27 23:21:46

solution1
2 ACCPTED 2017-10-01 21:01:02

solution2
0 2017-09-27 23:21:46