为什么fit_transform在此sklearn Pipeline示例中不起作用？

Question

I an new to sklearn Pipeline and following a sample code. 我是sklearn Pipeline的新手，并遵循示例代码。 I saw in other examples that we can do pipeline.fit_transform(train_X) , so I tried the same thing on the pipeline here pipeline.fit_transform(X) , but it gave me an error 我在其他示例中看到我们可以执行pipeline.fit_transform(train_X) ，因此我在此处的pipeline.fit_transform(X)上对管道进行了同样的尝试，但它给了我一个错误

" return self.fit(X, **fit_params).transform(X) “ return self.fit（X，** fit_params）.transform（X）

TypeError: fit() takes exactly 3 arguments (2 given)" TypeError：fit（）恰好接受3个参数（给定2个）“

If I remove the svm part and defined the pipeline as pipeline = Pipeline([("features", combined_features)]) , I still saw the error. 如果删除svm部分并将管道定义为pipeline = Pipeline([("features", combined_features)]) ，我仍然会看到错误。

Does anyone know why fit_transform doesn't work here? 有谁知道fit_transform为什么在这里不起作用？

from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.grid_search import GridSearchCV

from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

iris = load_iris()

X, y = iris.data, iris.target

# This dataset is way to high-dimensional. Better do PCA:
pca = PCA(n_components=2)

# Maybe some original features where good, too?
selection = SelectKBest(k=1)

# Build estimator from PCA and Univariate selection:

combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])

# Use combined features to transform dataset:
X_features = combined_features.fit(X, y).transform(X)

svm = SVC(kernel="linear")

# Do grid search over k, n_components and C:

pipeline = Pipeline([("features", combined_features), ("svm", svm)])

param_grid = dict(features__pca__n_components=[1, 2, 3],
                  features__univ_select__k=[1, 2],
                  svm__C=[0.1, 1, 10])

grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=10)
grid_search.fit(X, y)
print(grid_search.best_estimator_)

Answer 1

You get an error in the above example because you also need to pass the labels to your pipeline. 在上面的示例中会出现错误，因为您还需要将标签传递到管道。 You should be calling pipeline.fit_transform(X,y) . 您应该正在调用pipeline.fit_transform(X,y) 。 The last step in your pipeline is a classifier, SVC and the fit method of a classifier also requires the labels as a mandatory argument. pipeline的最后一步是分类器， SVC ，分类器的fit方法还需要将标签作为必需参数。 The fit method of all classifiers also require labels because the classification algorithms use these labels to train the weights in your classifier. 所有分类器的fit方法也需要标签，因为分类算法使用这些标签来训练分类器中的权重。

Similarly, even if you remove the SVC , you still get an error because the fit method of SelectKBest class also requires both X and y . 同样，即使删除SVC ，也仍然会出错，因为SelectKBest类的fit方法也需要X和y 。

为什么fit_transform在此sklearn Pipeline示例中不起作用？

问题描述

1 个解决方案

解决方案1
0 2016-06-28 05:33:13

为什么fit_transform在此sklearn Pipeline示例中不起作用？

问题描述

1 个解决方案

解决方案1 0 2016-06-28 05:33:13

解决方案1
0 2016-06-28 05:33:13