[英]Why doesn't fit_transform work in this sklearn Pipeline example?
I an new to sklearn Pipeline and following a sample code. 我是sklearn Pipeline的新手,并遵循示例代码。 I saw in other examples that we can do
pipeline.fit_transform(train_X)
, so I tried the same thing on the pipeline here pipeline.fit_transform(X)
, but it gave me an error 我在其他示例中看到我们可以执行
pipeline.fit_transform(train_X)
,因此我在此处的pipeline.fit_transform(X)
上对管道进行了同样的尝试,但它给了我一个错误
" return self.fit(X, **fit_params).transform(X) “ return self.fit(X,** fit_params).transform(X)
TypeError: fit() takes exactly 3 arguments (2 given)" TypeError:fit()恰好接受3个参数(给定2个)“
If I remove the svm part and defined the pipeline as pipeline = Pipeline([("features", combined_features)])
, I still saw the error. 如果删除svm部分并将管道定义为
pipeline = Pipeline([("features", combined_features)])
,我仍然会看到错误。
Does anyone know why fit_transform
doesn't work here? 有谁知道
fit_transform
为什么在这里不起作用?
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
iris = load_iris()
X, y = iris.data, iris.target
# This dataset is way to high-dimensional. Better do PCA:
pca = PCA(n_components=2)
# Maybe some original features where good, too?
selection = SelectKBest(k=1)
# Build estimator from PCA and Univariate selection:
combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])
# Use combined features to transform dataset:
X_features = combined_features.fit(X, y).transform(X)
svm = SVC(kernel="linear")
# Do grid search over k, n_components and C:
pipeline = Pipeline([("features", combined_features), ("svm", svm)])
param_grid = dict(features__pca__n_components=[1, 2, 3],
features__univ_select__k=[1, 2],
svm__C=[0.1, 1, 10])
grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=10)
grid_search.fit(X, y)
print(grid_search.best_estimator_)
You get an error in the above example because you also need to pass the labels to your pipeline. 在上面的示例中会出现错误,因为您还需要将标签传递到管道。 You should be calling
pipeline.fit_transform(X,y)
. 您应该正在调用
pipeline.fit_transform(X,y)
。 The last step in your pipeline
is a classifier, SVC
and the fit
method of a classifier also requires the labels as a mandatory argument. pipeline
的最后一步是分类器, SVC
,分类器的fit
方法还需要将标签作为必需参数。 The fit
method of all classifiers also require labels because the classification algorithms use these labels to train the weights in your classifier. 所有分类器的
fit
方法也需要标签,因为分类算法使用这些标签来训练分类器中的权重。
Similarly, even if you remove the SVC
, you still get an error because the fit
method of SelectKBest
class also requires both X
and y
. 同样,即使删除
SVC
,也仍然会出错,因为SelectKBest
类的fit
方法也需要X
和y
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.