[英]How to impute missing values with KNN
I'm trying to impute missing values from my data frames and for this I use fancyimpute library.我正在尝试从我的数据框中估算缺失值,为此我使用了fancyimpute 库。
from fancyimpute import KNN
X_filled_knn = KNN(k=3).complete(df_OppLine[['family']])
I v' got this error :我遇到了这个错误:
AttributeError Traceback (most recent call last)
<ipython-input-28-8475f35fc36a> in <module>()
----> 1 X_filled_knn = KNN(k=3).complete(df_OppLine[['family']])
AttributeError: 'KNN' object has no attribute 'complete'
Any idea to help me to fix this error?有什么想法可以帮助我解决这个错误吗?
Try changing it to:尝试将其更改为:
from fancyimpute import KNN
X_filled_knn = KNN(k=3).fit_transform(df_OppLine[['family']])
First you got to convert strings into numerical data.首先,您必须将字符串转换为数字数据。
Try one-hot encoding (creates a column for each category and values are 1 only for the respective category and the rest are 0).尝试单热编码(为每个类别创建一列,值仅为相应类别的 1,其余为 0)。 You can also try Ordinal encoding.您也可以尝试序数编码。 It assigns a value to each category它为每个类别分配一个值
from sklearn.preprocessing import OrdinalEncoder
# Create Ordinal encoder
initialize_encoder=OrdinalEncoder()
# Select non-null values of family column
family=df_OppLine["family"]
family_not_null=family[family.notnull()]
# Reshape family_not_null to shape (-1, 1)
reshaped_vals=family_not_null.values.reshape(-1,1)
# Ordinally encode reshaped_vals
encoded_vals=initialize_encoder.fit_transform(reshaped_vals)
# Assign back encoded values to non-null values
df_OppLine.loc[family.notnull(),"family"]=np.squeeze(encoded_vals)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.