简体   繁体   中英

What is difference between fit, transform and fit_transform in python when using sklearn?

from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values='NaN', strategy='mean',axis=0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3]=imputer.transform(X[:, 1:3]) 

Can you help me know what above code does? I don't know much about Imputer. Kindly help!

The confusing part is fit and transform.

 #here fit method will calculate the required parameters (In this case mean)
 #and store it in the impute object
 imputer = imputer.fit(X[:, 1:3])
 X[:, 1:3]=imputer.transform(X[:, 1:3]) 
 #imputer.transform will actually do the work of replacement of nan with mean.
 #This can be done in one step using fit_transform

Imputer is used to replace missing values. The fit method calculates the parameters while the fit_transform method changes the data to replace those NaN with the mean and outputs a new matrix X.

# Imports library
from sklearn.preprocessing import Imputer

# Create a new instance of the Imputer object
# Missing values are replaced with NaN
# Missing values are replaced by the mean later on
# The axis determines whether you want to move column or row wise
imputer = Imputer(missing_values='NaN', strategy='mean',axis=0)

# Fit the imputer to X
imputer = imputer.fit(X[:, 1:3])

# Replace in the original matrix X
# with the new values after the transformation of X
X[:, 1:3]=imputer.transform(X[:, 1:3]) 

I commented out the code for you, I hope this will make a bit more sense. You need to think of X as a matrix that you have to transform in order to have no more NaN (missing values).

Refer to the documentation for more information .

Your comments tell you the difference. It is saying that if you don't use imputer.fit, you can't do the replacement of nan with some method, for example with mean or median. To apply this process, you need to use imputer.transform after imputer.fit and then, you will have a new dataset without nan values.

See as far as I have understood import a specific class from the library

from sklearn.preprocessing import Imputer

Creating an object of the class which handles the data according to our personalized data

imputer = Imputer(missing_values='NaN', strategy='mean',axis=0)

Applying (as in applying a function on a data) to the matrix x

For example let an operator e applied to data d Imputer.fit returns ed imputer = imputer.fit(X[:, 1:3])

Now Imputer.transform computes the value of ed and assigns it to the given matrice

X[:, 1:3]=imputer.transform(X[:, 1:3])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM