简体   繁体   English

Python sklearn:fit_transform()不适用于GridSearchCV

[英]Python sklearn : fit_transform() does not work for GridSearchCV

I am creating a GridSearchCV classifier as 我正在创建一个GridSearchCV分类器

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english',sublinear_tf=True)),
    ('clf', LogisticRegression())
    ])

parameters= {}

gridSearchClassifier = GridSearchCV(pipeline, parameters, n_jobs=3, verbose=1, scoring='accuracy')
    # Fit/train the gridSearchClassifier on Training Set
    gridSearchClassifier.fit(Xtrain, ytrain)

This works well, and I can predict. 这很好用,我可以预测。 However, now I want to retrain the classifier. 但是,现在我想重新训练分类器。 For this I want to do a fit_transform() on some feedback data. 为此,我想在一些反馈数据上做一个fit_transform()

    gridSearchClassifier.fit_transform(Xnew, yNew)

But I get this error 但是我得到了这个错误

AttributeError: 'GridSearchCV' object has no attribute 'fit_transform'

basically i am trying to fit_transform() on the classifier's internal TfidfVectorizer . 基本上我想在分类器的内部TfidfVectorizer上尝试fit_transform() I know that i can access the Pipeline 's internal components using the named_steps attribute. 我知道我可以使用named_steps属性访问Pipeline的内部组件。 Can i do something similar for the gridSearchClassifier ? 我可以为gridSearchClassifier做类似的gridSearchClassifier吗?

Just call them step by step. 只需逐步打电话给他们。

gridSearchClassifier.fit(Xnew, yNew)
transformed = gridSearchClassifier.transform(Xnew)

the fit_transform is nothing more but these two lines of code, simply not implemented as a single method for GridSearchCV . fit_transform是这两行代码,根本没有实现为GridSearchCV的单一方法。

update 更新

From comments it seems that you are a bit lost of what GridSearchCV actually does. 从评论看来,你似乎有点迷失了GridSearchCV实际上做的事情。 This is a meta-method to fit a model with multiple hyperparameters. 这是一种适用于具有多个超参数的模型的元方法。 Thus, once you call fit you get an estimator inside the best_estimator_ field of your object. 因此,一旦调用了fit ,就会在对象的best_estimator_字段中得到一个估算器。 In your case - it is a pipeline, and you can extract any part of it as usual, thus 在你的情况下 - 它是一个管道,你可以像往常一样提取它的任何部分

gridSearchClassifier.fit(Xtrain, ytrain)
clf = gridSearchClassifier.best_estimator_
# do something with clf, its elements etc. 
# for example print clf.named_steps['vect']

you should not use gridsearchcv as a classifier, this is only a method of fitting hyperparameters, once you find them you should work with best_estimator_ instead. 应该使用gridsearchcv作为分类,这只是件超参数的方法,一旦你找到他们,你应该一起工作best_estimator_代替。 However, remember that if you refit the TFIDF vectorizer, then your classifier will be useless ; 但是,请记住, 如果您重新安装TFIDF矢量器,那么您的分类器将毫无用处 ; you cannot change data representation and expect old model to work well, you have to refit the whole classifier once your data change (unless this is carefully designed change, and you make sure old dimensions mean exactly the same - sklearn does not support such operations, you would have to implement this from scratch). 你不能改变数据表示并期望旧模型运行良好,你必须在数据更改后重新整理分类器(除非这是经过精心设计的更改,并确保旧维度完全相同 - sklearn不支持此类操作,你必须从头开始实现这一点。

@lejot is correct that you should call fit() on the gridSearchClassifier . @lejot是正确的,你应该在gridSearchClassifier上调用fit()

Provided refit=True is set on the GridSearchCV , which is the default, you can access best_estimator_ on the fitted gridSearchClassifier . 提供refit=True被设置在GridSearchCV ,这是默认的,你可以访问best_estimator_上拟合gridSearchClassifier

You can access the already fitted steps: 您可以访问已经安装的步骤:

tfidf = gridSearchClassifier.best_estimator_.named_steps['vect']
clf = gridSearchClassifier.best_estimator_.named_steps['clf']

You can then transform new text in new_X using: 然后,您可以使用以下方法在new_X转换新文本:

X_vec = tfidf.transform(new_X)

You can make predictions using this X_vec with: 您可以使用此X_vec进行预测:

x_pred = clf.predict(X_vec)

You can also make predictions for the text going through the pipeline entire pipeline with. 您还可以对通过管道整个管道的文本进行预测。

X_pred = gridSearchClassifier.predict(new_X)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 矢量化fit_transform如何在sklearn中工作? - How vectorizer fit_transform work in sklearn? 为什么fit_transform在此sklearn Pipeline示例中不起作用? - Why doesn't fit_transform work in this sklearn Pipeline example? sklearn PCA fit_transform() 是否以输入变量为中心? - Does sklearn PCA fit_transform() center input variables? sklearn.decomposition 中的 PCA 中的 fit、transform 和 fit_transform 有什么作用? - What does fit, transform, and fit_transform do in PCA available in sklearn.decomposition? 使用sklearn时python中的fit,transform和fit_transform有什么区别? - What is difference between fit, transform and fit_transform in python when using sklearn? sklearn.impute SimpleImputer:为什么transform()首先需要fit_transform()? - sklearn.impute SimpleImputer: why does transform() need fit_transform() first? sklearn countvectorizer 中的 fit_transform 和 transform 有什么区别? - What is the difference between fit_transform and transform in sklearn countvectorizer? sklearn中的'transform'和'fit_transform'有什么区别 - what is the difference between 'transform' and 'fit_transform' in sklearn Python fit_transform 仅返回零 - Python fit_transform return only zeros sklearn SVD fit_transform函数的输入数据类型 - Input data type for sklearn SVD fit_transform function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM