Suppose I have the following main df:
df = pd.DataFrame({'name':['Sara', 'John', 'Christine']})
df:
name
0 Sara
1 John
2 Christine
Now I have 4 other dfs with age and grade for the 3 usernames but with different NaN arrangement:
df2 = pd.DataFrame({'name':['Sara', 'John', 'Christine'],
'age': [26, 30, np.nan]})
df3:
df3 = pd.DataFrame({'name': ['Sara', 'John', 'Christine'],
'age': [np.nan, 30, 24]})
df4:
df4 = pd.DataFrame({'name': ['Sara', 'John', 'Christine'],
'grade': [np.nan, 1, 3]})
df5:
df5 = pd.DataFrame({'name': ['Sara', 'John', 'Christine'],
'grade': [12, np.nan, 3]})
I want to merge the data from the 4 dataframes to the main df
on name
column and remove NaNs.
What I did so far:
Created a list of dfs:
dfs = [df,df2,df3,df4,df5]
used reduce
:
from functools import reduce
df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), dfs)
df_final:
name age_x age_y grade_x grade_y
0 Sara 26.0 NaN NaN 12.0
1 John 30.0 30.0 1.0 NaN
2 Christine NaN 24.0 3.0 3.0
Expected output:
df_final:
name age grade
0 Sara 26.0 12
1 John 30.0 1.0
2 Christine 24.0 3.0
We can try merging long with concat
then using groupby first
to retrieve the first valid entry for each column per name:
merged = (
pd.concat(dfs).groupby('name', sort=False, as_index=False).first()
)
merged
:
name age grade
0 Sara 26.0 12.0
1 John 30.0 1.0
2 Christine 24.0 3.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.