简体   繁体   中英

Impute Pandas dataframe column with value from another row based on ID column

df:

id   name 
0    toto                    
1    tata
0    NaN

I would like to impute the name column missing value on the third row based on the id. The desired dataframe would be:

id   name 
0    toto                    
1    tata
0    toto

I did the following:

df.loc[df.name.isna(), "name"] = df["id"].map(df["name"])

but it is not working.

import pandas as pd
df = pd.DataFrame({'id':[0,1,0],
              'name':['toto','tata',pd.NA]})

df = df[['id']].merge(df[pd.notna(df['name'])].drop_duplicates(),
                      how = 'left', 
                      on = 'id')
df

If there is only one value exists in the group, you can try

df = df.groupby('id').apply(lambda g: g.ffill().bfill())
print(df)

   name
0  toto
1  tata
2  toto

Or sort NaN to the last

df = (df.sort_values('name')
      .groupby('id').ffill()
      .sort_index())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM