[英]Python sklearn : fit_transform() does not work for GridSearchCV
I am creating a GridSearchCV
classifier as 我正在创建一个
GridSearchCV
分类器
pipeline = Pipeline([
('vect', TfidfVectorizer(stop_words='english',sublinear_tf=True)),
('clf', LogisticRegression())
])
parameters= {}
gridSearchClassifier = GridSearchCV(pipeline, parameters, n_jobs=3, verbose=1, scoring='accuracy')
# Fit/train the gridSearchClassifier on Training Set
gridSearchClassifier.fit(Xtrain, ytrain)
This works well, and I can predict. 这很好用,我可以预测。 However, now I want to retrain the classifier.
但是,现在我想重新训练分类器。 For this I want to do a
fit_transform()
on some feedback data. 为此,我想在一些反馈数据上做一个
fit_transform()
。
gridSearchClassifier.fit_transform(Xnew, yNew)
But I get this error 但是我得到了这个错误
AttributeError: 'GridSearchCV' object has no attribute 'fit_transform'
basically i am trying to fit_transform()
on the classifier's internal TfidfVectorizer
. 基本上我想在分类器的内部
TfidfVectorizer
上尝试fit_transform()
。 I know that i can access the Pipeline
's internal components using the named_steps
attribute. 我知道我可以使用
named_steps
属性访问Pipeline
的内部组件。 Can i do something similar for the gridSearchClassifier
? 我可以为
gridSearchClassifier
做类似的gridSearchClassifier
吗?
Just call them step by step. 只需逐步打电话给他们。
gridSearchClassifier.fit(Xnew, yNew)
transformed = gridSearchClassifier.transform(Xnew)
the fit_transform
is nothing more but these two lines of code, simply not implemented as a single method for GridSearchCV
. fit_transform
是这两行代码,根本没有实现为GridSearchCV
的单一方法。
From comments it seems that you are a bit lost of what GridSearchCV actually does. 从评论看来,你似乎有点迷失了GridSearchCV实际上做的事情。 This is a meta-method to fit a model with multiple hyperparameters.
这是一种适用于具有多个超参数的模型的元方法。 Thus, once you call
fit
you get an estimator inside the best_estimator_
field of your object. 因此,一旦调用了
fit
,就会在对象的best_estimator_
字段中得到一个估算器。 In your case - it is a pipeline, and you can extract any part of it as usual, thus 在你的情况下 - 它是一个管道,你可以像往常一样提取它的任何部分
gridSearchClassifier.fit(Xtrain, ytrain)
clf = gridSearchClassifier.best_estimator_
# do something with clf, its elements etc.
# for example print clf.named_steps['vect']
you should not use gridsearchcv as a classifier, this is only a method of fitting hyperparameters, once you find them you should work with best_estimator_
instead. 你不应该使用gridsearchcv作为分类,这只是件超参数的方法,一旦你找到他们,你应该一起工作
best_estimator_
代替。 However, remember that if you refit the TFIDF vectorizer, then your classifier will be useless ; 但是,请记住, 如果您重新安装TFIDF矢量器,那么您的分类器将毫无用处 ; you cannot change data representation and expect old model to work well, you have to refit the whole classifier once your data change (unless this is carefully designed change, and you make sure old dimensions mean exactly the same - sklearn does not support such operations, you would have to implement this from scratch).
你不能改变数据表示并期望旧模型运行良好,你必须在数据更改后重新整理分类器(除非这是经过精心设计的更改,并确保旧维度完全相同 - sklearn不支持此类操作,你必须从头开始实现这一点。
@lejot is correct that you should call fit()
on the gridSearchClassifier
. @lejot是正确的,你应该在
gridSearchClassifier
上调用fit()
。
Provided refit=True
is set on the GridSearchCV
, which is the default, you can access best_estimator_
on the fitted gridSearchClassifier
. 提供
refit=True
被设置在GridSearchCV
,这是默认的,你可以访问best_estimator_
上拟合gridSearchClassifier
。
You can access the already fitted steps: 您可以访问已经安装的步骤:
tfidf = gridSearchClassifier.best_estimator_.named_steps['vect']
clf = gridSearchClassifier.best_estimator_.named_steps['clf']
You can then transform new text in new_X
using: 然后,您可以使用以下方法在
new_X
转换新文本:
X_vec = tfidf.transform(new_X)
You can make predictions using this X_vec
with: 您可以使用此
X_vec
进行预测:
x_pred = clf.predict(X_vec)
You can also make predictions for the text going through the pipeline entire pipeline with. 您还可以对通过管道整个管道的文本进行预测。
X_pred = gridSearchClassifier.predict(new_X)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.