[英]sklearn.impute SimpleImputer: why does transform() need fit_transform() first?
sklearn provides transform()
method to Apply one-hot encoder. sklearn提供了
transform()
方法来应用one-hot编码器。
to use transform()
method, fit_transform()
is needed before calling transform()
method, otherwise 要使用
transform()
方法,在调用transform()
方法之前需要fit_transform()
,否则
np.array([[1, 1], [2, 1], [3, 2], [np.nan, 2]])
from sklearn.impute import SimpleImputer
my_imputer = SimpleImputer()
my_imputer.transform(df)
error shows up 出现错误
NotFittedError: This SimpleImputer instance is not fitted yet.
NotFittedError:此SimpleImputer实例尚未安装。 Call 'fit' with appropriate arguments before using this method.
在使用此方法之前,使用适当的参数调用'fit'。
calling fit_transform()
before transform()
在
transform()
之前调用fit_transform()
transform()
my_imputer.fit_transform(df)
my_imputer.transform(df)
fix this error. 修复此错误。
the question is, why does transform()
need fit_transform()
? 问题是,为什么
transform()
需要fit_transform()
?
During fit()
the imputer learns about the mean, median etc of the data, which is then applied to the missing values during transform()
. 在
fit()
期间,imputer了解数据的均值,中位数等,然后在transform()
期间transform()
其应用于缺失值。
fit_transform()
is just a shorthand for combining the two methods. fit_transform()
只是组合这两种方法的简写。 So essentially: 基本上:
fit(X, y)
:- Learns about the required aspects of the supplied data and returns the new object with the learned parameters. fit(X, y)
: - 了解所提供数据的必要方面,并返回具有学习参数的新对象。 It does not change the supplied data in any way. 它不会以任何方式更改提供的数据。
transform()
:- Actually transform the supplied data to the new form. transform()
: - 实际上将提供的数据转换为新形式。
fit_transform(df)
is not required to be called before transform. 在变换之前不需要调用
fit_transform(df)
。 Only fit()
is needed to be called. 只需要调用
fit()
。 Generally the sequence you described is done with train and test split of data. 通常,您描述的序列是通过训练和测试数据分割来完成的。 Something like:
就像是:
# Combining the learning of parameters from training data and transforming into a single step.
X_train_new = my_imputer.fit_transform(X_train)
# We dont want to learn about test data, only change it according to previously learnt information
X_test_new = my_imputer.transform(X_test)
The above code snippet can be broken into: 上面的代码片段可以分解为:
# It learns about the data and does nothing else
my_imputer.fit(X_train)
# Calling transform to apply the learnt information on supplied data
X_train_new = my_imputer.transform(X_train)
X_test_new = my_imputer.transform(X_test)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.