简体   繁体   中英

What does fit() exactly does here?

Well, basically i want to know what does the fit() function does in general, but especially in the pieces of code down there.

Im taking the Machine Learning AZ Course because im pretty new to Machine Learning (i just started). I know some basic conceptual terms, but not the technical part.

CODE1:

from sklearn.impute import SimpleImputer

missingvalues = SimpleImputer(missing_values = np.nan, strategy = 'mean', verbose = 0) 

missingvalues = missingvalues.fit(X[:, 1:3])

X[:, 1:3] = missingvalues.transform(X[:, 1:3])

Some other example where I still have the doubt

CODE 2:

from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
print(sc_X)
X_train = sc_X.fit_transform(X_train)
print(X_train)
X_test = sc_X.transform(X_test)

I think that if I know like the general use for this function and what exactly does in general, I'll be good to go. But certaily I'd like to know what is doing on that code

Sklearn uses Classes. See the Python documentation for more info about Classes in Python. For more info about sklearn in particular, take a look at this sklearn documentation .

Here's a short description of how you are using Classes in sklearn .

First you instantiate your sklearn Classes with sc_X = StandardScaler() or missingvalues = SimpleImputer(...) .

The objects, sc_X and missingvalues , each have methods. You can use the methods typing object_name.method_name(...) . For example, you used the fit_transform() method of the sc_X instance when you typed, sc_X.fit_transform(...) . This method will take your data and return a scaled version of it. It both fit s (determines the scaling parameters) and transform s (applies scaling) to your data. The transform() method will transform new data, using the same scaling parameters it learned for your previous data.

In the first example, you have separated the fit and transform methods into two separate lines, but the idea is similar -- you first learn the imputation parameters with the fit method, and then you transform your data.

By the way, I think missingvalues = missingvalues.fit(X[:, 1:3]) could be changed to missingvalues.fit(X[:, 1:3]) .

Here is also a nice check-up possibility: https://scikit-learn.org/stable/tutorial/basic/tutorial.html

The fit -method is always to learn something in machine learning.

You normally have the following steps:

  1. Seperate your data into two/three datasets
  2. Pick one part of your data to learn/train something (normally X_train ) with fit
  3. Use the learned algorithm you predict something to unseen data (normally X_test ) with predict

In your first example: missingvalues.fit(X[:, 1:3]) You are training SimpleImputer based on your data X where you are only using column 1,2,3 , with transform you used this training to overwrite this data.

In your second example: You are training StandardScaler with X_train and are using this training for both datasets X_train, X_test , the StandardScaler learnes from X_train that means if he learned that 10 has to be converted to 2, he will convert 10 to 2 in both sets X_train, X_test .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM