简体   繁体   English

使用sklearn时python中的fit,transform和fit_transform有什么区别?

[英]What is difference between fit, transform and fit_transform in python when using sklearn?

from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values='NaN', strategy='mean',axis=0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3]=imputer.transform(X[:, 1:3]) 

Can you help me know what above code does?你能帮我知道上面的代码是做什么的吗? I don't know much about Imputer.我对 Imputer 了解不多。 Kindly help!请帮助!

The confusing part is fit and transform.令人困惑的部分是适合和转换。

 #here fit method will calculate the required parameters (In this case mean)
 #and store it in the impute object
 imputer = imputer.fit(X[:, 1:3])
 X[:, 1:3]=imputer.transform(X[:, 1:3]) 
 #imputer.transform will actually do the work of replacement of nan with mean.
 #This can be done in one step using fit_transform

Imputer is used to replace missing values. Imputer 用于替换缺失值。 The fit method calculates the parameters while the fit_transform method changes the data to replace those NaN with the mean and outputs a new matrix X. fit 方法计算参数,而 fit_transform 方法更改数据以用均值替换那些 NaN 并输出新矩阵 X。

# Imports library
from sklearn.preprocessing import Imputer

# Create a new instance of the Imputer object
# Missing values are replaced with NaN
# Missing values are replaced by the mean later on
# The axis determines whether you want to move column or row wise
imputer = Imputer(missing_values='NaN', strategy='mean',axis=0)

# Fit the imputer to X
imputer = imputer.fit(X[:, 1:3])

# Replace in the original matrix X
# with the new values after the transformation of X
X[:, 1:3]=imputer.transform(X[:, 1:3]) 

I commented out the code for you, I hope this will make a bit more sense.我为你注释掉了代码,我希望这会更有意义。 You need to think of X as a matrix that you have to transform in order to have no more NaN (missing values).您需要将 X 视为一个矩阵,您必须对其进行转换才能不再有 NaN(缺失值)。

Refer to the documentation for more information .有关详细信息,请参阅文档。

Your comments tell you the difference.你的评论告诉你区别。 It is saying that if you don't use imputer.fit, you can't do the replacement of nan with some method, for example with mean or median.这是说,如果你不使用 imputer.fit,你就不能用某种方法替换 nan,例如用均值或中值。 To apply this process, you need to use imputer.transform after imputer.fit and then, you will have a new dataset without nan values.要应用此过程,您需要在 imputer.fit 之后使用 imputer.transform,然后您将拥有一个没有 nan 值的新数据集。

See as far as I have understood import a specific class from the library据我所知,从库中​​导入一个特定的类

from sklearn.preprocessing import Imputer

Creating an object of the class which handles the data according to our personalized data创建一个类的对象,根据我们的个性化数据处理数据

imputer = Imputer(missing_values='NaN', strategy='mean',axis=0)

Applying (as in applying a function on a data) to the matrix x应用(如在数据上应用函数)到矩阵 x

For example let an operator e applied to data d Imputer.fit returns ed imputer = imputer.fit(X[:, 1:3])例如让一个操作符 e 应用于数据 d Imputer.fit返回 ed imputer = imputer.fit(X[:, 1:3])

Now Imputer.transform computes the value of ed and assigns it to the given matrice现在Imputer.transform计算 ed 的值并将其分配给给定的矩阵

X[:, 1:3]=imputer.transform(X[:, 1:3])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 sklearn countvectorizer 中的 fit_transform 和 transform 有什么区别? - What is the difference between fit_transform and transform in sklearn countvectorizer? sklearn中的'transform'和'fit_transform'有什么区别 - what is the difference between 'transform' and 'fit_transform' in sklearn 使用 fit_transform() 和 transform() - Using fit_transform() and transform() Python sklearn:fit_transform()不适用于GridSearchCV - Python sklearn : fit_transform() does not work for GridSearchCV ColumnTransformer 在 sklearn 中尝试 fit_transform 管道时生成 TypeError - ColumnTransformer generating a TypeError when trying to fit_transform pipeline in sklearn 矢量化fit_transform如何在sklearn中工作? - How vectorizer fit_transform work in sklearn? sklearn.decomposition 中的 PCA 中的 fit、transform 和 fit_transform 有什么作用? - What does fit, transform, and fit_transform do in PCA available in sklearn.decomposition? 不同的 output 同时使用 fit_transform vs fit and transform from sklearn - Different output while using fit_transform vs fit and transform from sklearn 在 piepline 中使用特征选择和 ML model 时,如何确保 sklearn piepline 应用 fit_transform 方法? - How to be sure that sklearn piepline applies fit_transform method when using feature selection and ML model in piepline? Python fit_transform 仅返回零 - Python fit_transform return only zeros
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM