Impute categorical missing values in scikit-learn using specific column

Question

i have data set for patients ,i want to handle missing value for these data, it contain both numerical and text, the idea that i want to handle based on subject id. Not replace based on columns only the data set looks like this

 subject_id     time      heart_rate      blood_pressure    urine_color 
   1             1.10          23              60                red
   1              2                            40                
   2             3             60              80              
   2             4                                            dark yellow

i want to replace text data with most frequent patient's data and numeric with mean values for patient also, to be like this

 subject_id     time      heart_rate      blood_pressure    urine_color 
   1             1.10          23              60                red
   1              2            23              40                red
   2             3             60              80              dark yellow 
   2             4             60              80              dark yellow

any one can help in this , all impute method i search about , use most frequent in column , or statistical analysis for the whole column

Answer 1

Use GroupBy.transform with custom function for mean with numeric columns and mode for categoricals columns and replace missing values by DataFrame.fillna :

f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else x.mode().iat[0]

Alternative if possible all NaN s values for categorical columns per group:

f = lambda x: x.mean() if np.issubdtype(x.dtype, np.number) else next(iter(x.mode()), None)

cols = df.columns.difference(['subject_id'])
df[cols] = df[cols].fillna(df.groupby('subject_id')[cols].transform(f))
print (df)
   subject_id time  heart_rate  blood_pressure  urine_color
0           1  1.1          23              60          red
1           1    2          23              40          red
2           2    3          60              80  dark yellow
3           2    4          60              80  dark yellow

Impute categorical missing values in scikit-learn using specific column

Question

1 answers

solution1
0 ACCPTED 2019-12-03 07:19:05

Impute categorical missing values in scikit-learn using specific column

Question

1 answers

solution1 0 ACCPTED 2019-12-03 07:19:05

solution1
0 ACCPTED 2019-12-03 07:19:05