sklearn.impute SimpleImputer：为什么transform（）首先需要fit_transform（）？

Question

sklearn provides transform() method to Apply one-hot encoder. sklearn提供了transform()方法来应用one-hot编码器。

to use transform() method, fit_transform() is needed before calling transform() method, otherwise 要使用transform()方法，在调用transform()方法之前需要fit_transform() ，否则

np.array([[1, 1], [2, 1], [3, 2], [np.nan, 2]])
from sklearn.impute import SimpleImputer
my_imputer = SimpleImputer()
my_imputer.transform(df)

error shows up 出现错误

NotFittedError: This SimpleImputer instance is not fitted yet. NotFittedError：此SimpleImputer实例尚未安装。 Call 'fit' with appropriate arguments before using this method. 在使用此方法之前，使用适当的参数调用'fit'。

calling fit_transform() before transform() 在transform()之前调用fit_transform() transform()

my_imputer.fit_transform(df)
my_imputer.transform(df)

fix this error. 修复此错误。

the question is, why does transform() need fit_transform() ? 问题是，为什么transform()需要fit_transform() ？

Answer 1

During fit() the imputer learns about the mean, median etc of the data, which is then applied to the missing values during transform() . 在fit()期间，imputer了解数据的均值，中位数等，然后在transform()期间transform()其应用于缺失值。

fit_transform() is just a shorthand for combining the two methods. fit_transform()只是组合这两种方法的简写。 So essentially: 基本上：

fit(X, y) :- Learns about the required aspects of the supplied data and returns the new object with the learned parameters. fit(X, y) ： - 了解所提供数据的必要方面，并返回具有学习参数的新对象。 It does not change the supplied data in any way. 它不会以任何方式更改提供的数据。
transform() :- Actually transform the supplied data to the new form. transform() ： - 实际上将提供的数据转换为新形式。

fit_transform(df) is not required to be called before transform. 在变换之前不需要调用fit_transform(df) 。 Only fit() is needed to be called. 只需要调用fit() 。 Generally the sequence you described is done with train and test split of data. 通常，您描述的序列是通过训练和测试数据分割来完成的。 Something like: 就像是：

# Combining the learning of parameters from training data and transforming into a single step.
X_train_new = my_imputer.fit_transform(X_train)

# We dont want to learn about test data, only change it according to previously learnt information
X_test_new = my_imputer.transform(X_test)

The above code snippet can be broken into: 上面的代码片段可以分解为：

# It learns about the data and does nothing else
my_imputer.fit(X_train)

# Calling transform to apply the learnt information on supplied data
X_train_new = my_imputer.transform(X_train)
X_test_new = my_imputer.transform(X_test)

sklearn.impute SimpleImputer：为什么transform（）首先需要fit_transform（）？

问题描述

1 个解决方案

解决方案1
0 2019-05-08 09:05:33

sklearn.impute SimpleImputer：为什么transform（）首先需要fit_transform（）？

问题描述

1 个解决方案

解决方案1 0 2019-05-08 09:05:33

解决方案1
0 2019-05-08 09:05:33