[英]Imputing Missing Values in Python
I want to impute a couple of columns in my data frame using Scikit-Learn SimpleImputer
.我想使用 Scikit-Learn
SimpleImputer
在我的数据框中估算几列。 I tried doing this, but with no luck.我尝试这样做,但没有运气。 How should I modify my code?
我应该如何修改我的代码?
a
, b
, e
are the columns in my data frame that I want to impute. a
, b
, e
是我要估算的数据框中的列。
My data frame:我的数据框:
a b c d e
NA 39 cat gray 20
5 NA dog brown NA
7 53 cat tan 33
NA NA cat black 41
4 24 dog tan NA
My code:我的代码:
from sklearn.impute import SimpleImputer
miss_mean_imputer = SimpleImputer(missing_values='NaN', strategy='mean', axis=0)
miss_mean_imputer = miss_mean_imputer.fit(df["a", "b", "e"])
imputed_df = miss_mean_imputer.transform(df.values)
print(imputed_df)
You should replace missing_values='NaN'
with missing_values=np.nan
when instantiating the imputer and you should also make sure that the imputer is used to transform the same data to which it has been fitted, see the code below.在实例化输入
missing_values=np.nan
时,您应该将missing_values='NaN'
替换为missing_values=np.nan
,并且您还应该确保输入器用于转换已拟合的相同数据,请参阅下面的代码。
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
df = pd.DataFrame({
'a': [np.nan, 5.0, 7.0, np.nan, 4.0],
'b': [39.0, np.nan, 53.0, np.nan, 24.0],
'c': ['cat', 'dog', 'cat', 'cat', 'dog'],
'd': ['gray', 'brown', 'tan', 'black', 'tan'],
'e': [20.0, np.nan, 33.0, 41.0, np.nan]
})
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(df[['a', 'b', 'e']])
imputed_df = df.copy()
imputed_df[['a', 'b', 'e']] = imputer.transform(df[['a', 'b', 'e']])
print(imputed_df)
# a b c d e
# 0 5.333333 39.000000 cat gray 20.000000
# 1 5.000000 38.666667 dog brown 31.333333
# 2 7.000000 53.000000 cat tan 33.000000
# 3 5.333333 38.666667 cat black 41.000000
# 4 4.000000 24.000000 dog tan 31.333333
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.