I want to fill Nan values in 'gvkey' column with another column values in different dataframe.
df
wrds_id isin_code gvkey gvkey_new
0 1004 US0003611052 1004.0 1004.0
1 1005 US1948302047 Nan 1000.0
2 1006 US1948302047 Nan 1004.0
3 1007 US0309541011 Nan 1004.0
4 1007 US0003611052 1004.0 1004.0
5 1008 IL0006046119 Nan 1004.0
6 1008 US0003611052 1004.0 1004.0
7 1009 US4448591028 Nan 1004.0
8 1004 US4448591028 Nan 1004.0
9 1004 US4448591028 Nan 1004.0
10 1013 US0008861017 1013.0 1013.0
11 1013 BE0003755692 Nan 1013.0
12 1013 BE0003755692 Nan 1013.0
Using this frame, based on isin_code values, replace Nan value with gvkey_ciq_new values in first dataframe
df1
isin_code gvkey_ciq_new
0 US0309541011 1485.0
1 IL0006046119 2018.0
3 US1948302047 3176.0
4 US2376881064 3760.0
5 BE0003755692 5150.0
7 US4448591028 5776.0
9 GB0004544929 5898.0
Final dataframe I want to make,
finaldf
wrds_id isin_imp gvkey gvkey_new
0 1004 US0003611052 1004.0 1004.0
1 1005 US1948302047 3176.0 1004.0
2 1006 US1948302047 3176.0 1004.0
3 1007 US0309541011 1485.0 1004.0
4 1007 US0003611052 1004.0 1004.0
5 1008 IL0006046119 2018.0 1004.0
6 1008 US0003611052 1004.0 1004.0
7 1009 US4448591028 5776.0 1004.0
8 1004 US4448591028 5776.0 1004.0
9 1004 US4448591028 5776.0 1004.0
10 1013 US0008861017 1013.0 1013.0
11 1013 BE0003755692 5150.0 1013.0
12 1013 BE0003755692 5150.0 1013.0
How can I use map function to make the dataframe?
Don't use any loops. Merge the dataframes and use numpy.where
with pandas.inull
from io import StringIO
import numpy
import pandas
d1 = StringIO("""\
wrds_id isin_code gvkey gvkey_new
0 1004 US0003611052 1004.0 1004.0
1 1005 US1948302047 Nan 1000.0
2 1006 US1948302047 Nan 1004.0
3 1007 US0309541011 Nan 1004.0
4 1007 US0003611052 1004.0 1004.0
5 1008 IL0006046119 Nan 1004.0
6 1008 US0003611052 1004.0 1004.0
7 1009 US4448591028 Nan 1004.0
8 1004 US4448591028 Nan 1004.0
9 1004 US4448591028 Nan 1004.0
10 1013 US0008861017 1013.0 1013.0
11 1013 BE0003755692 Nan 1013.0
12 1013 BE0003755692 Nan 1013.0
""")
d2 = StringIO("""\
isin_code gvkey_ciq_new
0 US0309541011 1485.0
1 IL0006046119 2018.0
3 US1948302047 3176.0
4 US2376881064 3760.0
5 BE0003755692 5150.0
7 US4448591028 5776.0
9 GB0004544929 5898.0
""")
df1 = pandas.read_table(d1, sep='\s+', na_values=['Nan'])
df2 = pandas.read_table(d2, sep='\s+', na_values=['Nan'])
result = (
df1.merge(df2, on='isin_code', how='left')
.assign(gvkey=lambda df: numpy.where(
pandas.isnull(df['gvkey']), # condition
df['gvkey_ciq_new'], # values if true
df['gvkey'] # values if false
))[df1.columns] # only keep original columns
)
print(result)
wrds_id isin_code gvkey gvkey_new
0 1004 US0003611052 1004.0 1004.0
1 1005 US1948302047 3176.0 1000.0
2 1006 US1948302047 3176.0 1004.0
3 1007 US0309541011 1485.0 1004.0
4 1007 US0003611052 1004.0 1004.0
5 1008 IL0006046119 2018.0 1004.0
6 1008 US0003611052 1004.0 1004.0
7 1009 US4448591028 5776.0 1004.0
8 1004 US4448591028 5776.0 1004.0
9 1004 US4448591028 5776.0 1004.0
10 1013 US0008861017 1013.0 1013.0
11 1013 BE0003755692 5150.0 1013.0
12 1013 BE0003755692 5150.0 1013.0
First, make a temporary df without NaNs. Then you can use boolean indexing:
df_tmp = df[df.gvkey != pd.np.nan]
for code, gv in zip(df_tmp["isin_code"], df_tmp["gvkey"]):
df1.at[df1.isin_code == code,"gvkey"] = gv
Maybe not the most elegant solution, but it should work.
EDIT : Or you add in the loop something like
if gv != pd.np.nan:
Then you don't need the df_tmp
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.