简体繁体中英

Whats the impact of fit_transform in machine learning

原文 2020-09-03 14:23:54 1 1 machine-learning/ scikit-learn

We usually apply .fit_transform() on X_train and .transform() on X_test

This is because they are from the same dataset. What if we apply fit_transform() to the X_test again. How will this affect our model?

1 answers

For example, if you're applying a SimpleImputer to impute numeric missing values with the mean, each time you call the fit_transform method you are:

calculating the mean for that variable(s)
substituting the missing with the calculated mean

Now, if you apply fit_transform to both train and test, it could give 2 different mean for each variable, thus resulting in 2 different data processes.

Moreover, here's another less statistical, more practical issue. If you deploy the process in production and apply this process to a single record, which "mean" will you use? The train one or the test one? Or would you apply fit_transform also to that record, calculating the mean of one?

Using fit_transform() and transform()

How fit_transform, transform and TfidfVectorizer works

Difference between transform and fit_transform

How vectorizer fit_transform work in sklearn?

Getting Error on StandardScalar Fit_Transform

CountVectorizer takes too long to fit_transform

quadratic featurizer: preprocessing with fit_transform

Scikit learn - fit_transform on the test set

fit_transform data before running algorithm

fit_transform PCA inconsistent results

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Using fit_transform() and transform() How fit_transform, transform and TfidfVectorizer works Difference between transform and fit_transform How vectorizer fit_transform work in sklearn? Getting Error on StandardScalar Fit_Transform CountVectorizer takes too long to fit_transform quadratic featurizer: preprocessing with fit_transform Scikit learn - fit_transform on the test set fit_transform data before running algorithm fit_transform PCA inconsistent results

Related Tags

Whats the impact of fit_transform in machine learning

Question

1 answers

solution1 0 ACCPTED 2020-09-03 14:50:39

solution1
0 ACCPTED 2020-09-03 14:50:39