简体   繁体   English

sklearn.impute SimpleImputer:为什么transform()首先需要fit_transform()?

[英]sklearn.impute SimpleImputer: why does transform() need fit_transform() first?

sklearn provides transform() method to Apply one-hot encoder. sklearn提供了transform()方法来应用one-hot编码器。

to use transform() method, fit_transform() is needed before calling transform() method, otherwise 要使用transform()方法,在调用transform()方法之前需要fit_transform() ,否则

np.array([[1, 1], [2, 1], [3, 2], [np.nan, 2]])
from sklearn.impute import SimpleImputer
my_imputer = SimpleImputer()
my_imputer.transform(df)

error shows up 出现错误

NotFittedError: This SimpleImputer instance is not fitted yet. NotFittedError:此SimpleImputer实例尚未安装。 Call 'fit' with appropriate arguments before using this method. 在使用此方法之前,使用适当的参数调用'fit'。

calling fit_transform() before transform() transform()之前调用fit_transform() transform()

my_imputer.fit_transform(df)
my_imputer.transform(df)

fix this error. 修复此错误。

the question is, why does transform() need fit_transform() ? 问题是,为什么transform()需要fit_transform()

During fit() the imputer learns about the mean, median etc of the data, which is then applied to the missing values during transform() . fit()期间,imputer了解数据的均值,中位数等,然后在transform()期间transform()其应用于缺失值。

fit_transform() is just a shorthand for combining the two methods. fit_transform()只是组合这两种方法的简写。 So essentially: 基本上:

  • fit(X, y) :- Learns about the required aspects of the supplied data and returns the new object with the learned parameters. fit(X, y) : - 了解所提供数据的必要方面,并返回具有学习参数的新对象。 It does not change the supplied data in any way. 它不会以任何方式更改提供的数据。

  • transform() :- Actually transform the supplied data to the new form. transform() : - 实际上将提供的数据转换为新形式。

fit_transform(df) is not required to be called before transform. 在变换之前不需要调用fit_transform(df) Only fit() is needed to be called. 只需要调用fit() Generally the sequence you described is done with train and test split of data. 通常,您描述的序列是通过训练和测试数据分割来完成的。 Something like: 就像是:

# Combining the learning of parameters from training data and transforming into a single step.
X_train_new = my_imputer.fit_transform(X_train)

# We dont want to learn about test data, only change it according to previously learnt information
X_test_new = my_imputer.transform(X_test)

The above code snippet can be broken into: 上面的代码片段可以分解为:

# It learns about the data and does nothing else
my_imputer.fit(X_train)

# Calling transform to apply the learnt information on supplied data
X_train_new = my_imputer.transform(X_train)
X_test_new = my_imputer.transform(X_test)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM