Python sklearn：fit_transform（）不适用于GridSearchCV

Question

I am creating a GridSearchCV classifier as 我正在创建一个GridSearchCV分类器

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english',sublinear_tf=True)),
    ('clf', LogisticRegression())
    ])

parameters= {}

gridSearchClassifier = GridSearchCV(pipeline, parameters, n_jobs=3, verbose=1, scoring='accuracy')
    # Fit/train the gridSearchClassifier on Training Set
    gridSearchClassifier.fit(Xtrain, ytrain)

This works well, and I can predict. 这很好用，我可以预测。 However, now I want to retrain the classifier. 但是，现在我想重新训练分类器。 For this I want to do a fit_transform() on some feedback data. 为此，我想在一些反馈数据上做一个fit_transform() 。

    gridSearchClassifier.fit_transform(Xnew, yNew)

But I get this error 但是我得到了这个错误

AttributeError: 'GridSearchCV' object has no attribute 'fit_transform'

basically i am trying to fit_transform() on the classifier's internal TfidfVectorizer . 基本上我想在分类器的内部TfidfVectorizer上尝试fit_transform() 。 I know that i can access the Pipeline 's internal components using the named_steps attribute. 我知道我可以使用named_steps属性访问Pipeline的内部组件。 Can i do something similar for the gridSearchClassifier ? 我可以为gridSearchClassifier做类似的gridSearchClassifier吗？

Answer 1

Just call them step by step. 只需逐步打电话给他们。

gridSearchClassifier.fit(Xnew, yNew)
transformed = gridSearchClassifier.transform(Xnew)

the fit_transform is nothing more but these two lines of code, simply not implemented as a single method for GridSearchCV . fit_transform是这两行代码，根本没有实现为GridSearchCV的单一方法。

update 更新

From comments it seems that you are a bit lost of what GridSearchCV actually does. 从评论看来，你似乎有点迷失了GridSearchCV实际上做的事情。 This is a meta-method to fit a model with multiple hyperparameters. 这是一种适用于具有多个超参数的模型的元方法。 Thus, once you call fit you get an estimator inside the best_estimator_ field of your object. 因此，一旦调用了fit ，就会在对象的best_estimator_字段中得到一个估算器。 In your case - it is a pipeline, and you can extract any part of it as usual, thus 在你的情况下 - 它是一个管道，你可以像往常一样提取它的任何部分

gridSearchClassifier.fit(Xtrain, ytrain)
clf = gridSearchClassifier.best_estimator_
# do something with clf, its elements etc. 
# for example print clf.named_steps['vect']

you should not use gridsearchcv as a classifier, this is only a method of fitting hyperparameters, once you find them you should work with best_estimator_ instead. 你不应该使用gridsearchcv作为分类，这只是件超参数的方法，一旦你找到他们，你应该一起工作best_estimator_代替。 However, remember that if you refit the TFIDF vectorizer, then your classifier will be useless ; 但是，请记住， 如果您重新安装TFIDF矢量器，那么您的分类器将毫无用处 ; you cannot change data representation and expect old model to work well, you have to refit the whole classifier once your data change (unless this is carefully designed change, and you make sure old dimensions mean exactly the same - sklearn does not support such operations, you would have to implement this from scratch). 你不能改变数据表示并期望旧模型运行良好，你必须在数据更改后重新整理分类器（除非这是经过精心设计的更改，并确保旧维度完全相同 - sklearn不支持此类操作，你必须从头开始实现这一点。

Answer 2

@lejot is correct that you should call fit() on the gridSearchClassifier . @lejot是正确的，你应该在gridSearchClassifier上调用fit() 。

Provided refit=True is set on the GridSearchCV , which is the default, you can access best_estimator_ on the fitted gridSearchClassifier . 提供refit=True被设置在GridSearchCV ，这是默认的，你可以访问best_estimator_上拟合gridSearchClassifier 。

You can access the already fitted steps: 您可以访问已经安装的步骤：

tfidf = gridSearchClassifier.best_estimator_.named_steps['vect']
clf = gridSearchClassifier.best_estimator_.named_steps['clf']

You can then transform new text in new_X using: 然后，您可以使用以下方法在new_X转换新文本：

X_vec = tfidf.transform(new_X)

You can make predictions using this X_vec with: 您可以使用此X_vec进行预测：

x_pred = clf.predict(X_vec)

You can also make predictions for the text going through the pipeline entire pipeline with. 您还可以对通过管道整个管道的文本进行预测。

X_pred = gridSearchClassifier.predict(new_X)

Python sklearn：fit_transform（）不适用于GridSearchCV

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-12-31 15:51:36

update 更新

解决方案2
1 2015-12-31 16:01:25

Python sklearn：fit_transform（）不适用于GridSearchCV

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-12-31 15:51:36

update 更新

解决方案2 1 2015-12-31 16:01:25

解决方案1
4 已采纳 2015-12-31 15:51:36

解决方案2
1 2015-12-31 16:01:25