[英]Whats the impact of fit_transform in machine learning
We usually apply .fit_transform()
on X_train
and .transform()
on X_test
我们通常采用
.fit_transform()
上X_train
和.transform()
上X_test
This is because they are from the same dataset.这是因为它们来自相同的数据集。 What if we apply
fit_transform()
to the X_test
again.如果我们再次将
fit_transform()
应用于fit_transform()
X_test
。 How will this affect our model?这将如何影响我们的模型?
For example, if you're applying a SimpleImputer
to impute numeric missing values with the mean, each time you call the fit_transform
method you are:例如,如果您应用
SimpleImputer
来用均值fit_transform
数字缺失值,则每次调用fit_transform
方法时,您都是:
Now, if you apply fit_transform
to both train and test, it could give 2 different mean for each variable, thus resulting in 2 different data processes.现在,如果您将
fit_transform
应用于训练和测试,它可以为每个变量提供 2 个不同的平均值,从而导致 2 个不同的数据处理。
Moreover, here's another less statistical, more practical issue.此外,这是另一个不太统计、更实际的问题。 If you deploy the process in production and apply this process to a single record, which "mean" will you use?
如果您在生产中部署该流程并将此流程应用于单个记录,您将使用哪个“意思”? The train one or the test one?
火车一号还是测试一号? Or would you apply
fit_transform
also to that record, calculating the mean of one?或者您
fit_transform
也将fit_transform
应用于该记录,计算一个的平均值?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.