简体繁体 English

有什么理由做.fit（）和.transform（）而不是just.fit_transform（）？

[英]Is there any reason to do .fit() and .transform() instead of just .fit_transform()?

原文 2020-06-18 13:02:51 7 1 python/ python-3.x/ scikit-learn

I just started learning ML and wondered why one would do .fit() and .transform() separately, when .fit_transform() exists.我刚开始学习 ML，想知道当.fit_transform()存在时，为什么要分别做.fit()和.transform() 。 Also, I am generally confused on what exactly fitting/ .fit() does.另外，我通常对 fit/ .fit()的作用感到困惑。

1 个解决方案

I assume you are talking about sklearn's scalers or sklearn's feature transformation algorithms in general.我假设您通常在谈论 sklearn 的缩放器或 sklearn 的特征转换算法。

Let's say your dataset is splitted in 5 sub-sets and you want to scale each of them between -1 and 1:假设您的数据集分为 5 个子集，并且您希望将每个子集在 -1 和 1 之间缩放：

You fit your scaler on each sub-set using fit , this basically searches for the maximum and minimum over all of your sets您使用fit将缩放器安装在每个子集上，这基本上会搜索所有集的最大值和最小值
Then, you can scale your sub-sets using transform然后，您可以使用transform缩放您的子集

If you had used fit_transform , on the first sub-set, then used it on the second one, it would have been scaled differently, and you don't want that.如果您在第一个子集上使用了fit_transform ，然后在第二个子集上使用了它，那么它的缩放比例会有所不同，而您不希望这样。

Moreover, instead of sub-sets, you can think of fitting once on your training set and keeping the transformation in memory to scale future samples you want to pass to your model.此外，您可以考虑在训练集上拟合一次，而不是子集，并将转换保持在 memory 中，以扩展您想要传递给 model 的未来样本。