简体   繁体   中英

Scaling data in RFECV with scikit-learn

It is common to scale the training and testing data separately before training and predicting progress of a classification task.

I want to embed the aforementioned process in RFECV which runs CV tests thus I tried the following:

Do X_scaled = preprocessing.scale(X) in the first place, where X is the whole data set. By doing so, training and testing data are not scaled separately, which is not considered.

The other way I tried is to pass:

scaling_svm = Pipeline([('scaler', preprocessing.StandardScaler()),
                        ('svm',LinearSVC(penalty=penalty, dual=False, class_weight='auto'))])

as parameter to the argument in RFECV :

rfecv = RFECV(estimator=scaling_svm, step=1, cv=StratifiedKFold(y, 7),
                  scoring=score, verbose=0)

However, I got an error since RFECV needs the estimator to have attribute .coef_ . What should I suppose to do? Any help would be appreciated.

A bit late to the party, admittedly, but if anyone is interested you can create a customised pipeline as follows:

from sklearn.pipeline import Pipeline
class RfePipeline(Pipeline):
    @property
    def coef_(self):
        return self._final_estimator.coef_

And then replace Pipeline with RfePipeline in your code.

See similar question here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM