简体   繁体   中英

Impute categorical missing values in scikit-learn using specific column

i have data set for patients ,i want to handle missing value for these data, it contain both numerical and text, the idea that i want to handle based on subject id. Not replace based on columns only the data set looks like this

 subject_id     time      heart_rate      blood_pressure    urine_color 
   1             1.10          23              60                red
   1              2                            40                
   2             3             60              80              
   2             4                                            dark yellow 

i want to replace text data with most frequent patient's data and numeric with mean values for patient also, to be like this

 subject_id     time      heart_rate      blood_pressure    urine_color 
   1             1.10          23              60                red
   1              2            23              40                red
   2             3             60              80              dark yellow 
   2             4             60              80              dark yellow

any one can help in this , all impute method i search about , use most frequent in column , or statistical analysis for the whole column

Use GroupBy.transform with custom function for mean with numeric columns and mode for categoricals columns and replace missing values by DataFrame.fillna :

f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else x.mode().iat[0]

Alternative if possible all NaN s values for categorical columns per group:

f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else next(iter(x.mode()), None)

cols = df.columns.difference(['subject_id'])
df[cols] = df[cols].fillna(df.groupby('subject_id')[cols].transform(f))
print (df)
   subject_id time  heart_rate  blood_pressure  urine_color
0           1  1.1          23              60          red
1           1    2          23              40          red
2           2    3          60              80  dark yellow
3           2    4          60              80  dark yellow

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM