RFECV in sklearn, scores from grid_scores_

Question

I am using sklearn.feature_selection.RFECV :

ref = RFECV(lr, step=1, cv =5, scoring="r2")
ref.fit(X_ndarr, y_ndarr)
print(ref.grid_scores_)

I get:

[ 0.9316829 0.93472609 0.79440118 -2.37744438 -1.20559428 -1.35899883 -0.90087801 -1.02047363 -0.54169276 -0.08116821 -0.00685128 0.1561999 -0.26433411 -0.27843449 -0.32703359 -0.32782641 -0.30881354 0.11878835 0.08175137 0.04300757
0.0378917 0.04534877]

RFECV removes the least important feature at each step, so the score for eg 10 features should be the best achieved score for any 10 features, while when I run the code below using a selected 10 feature (using another way):

from sklearn.model_selection import cross_val_score
lr = linear_model.LinearRegression()
scores = cross_val_score(lr, X_top10_ndarr, y_ndarr, cv=5) # top10 features

Then I get:

cross-validation scores: [0.96706997 0.9653103 0.96386666 0.96017565 0.96603127]

All of the scores are around 0.96 , while the score with 10 features from RFECV is -0.08 .

What exactly is happening here?

EDIT1 : The number of selected features is 2 and the ranking_ is as follows:

[ 4 7 1 6 3 2 8 11 5 10 21 9 12 14 13 15 16 19 18 17 1 20]

Answer 1

ref.grid_scores_ represent the cross-validation scores such that grid_scores_[i] corresponds to the CV score of the i-th subset of features.

Refer to this answer for more understanding of these values.

Going by that explanation, the model's cv score for 10 features would be -0.26433411

Having said that, the score is really bad since it is negative probably linear models may not be good for your data set.

one more point to note is that even will all the features, you have go only 0.9316829 which is less than 0.96.

May be set a random_state with StratifiedKFold and feed that as a cv param value.

RFECV in sklearn, scores from grid_scores_

Question

1 answers

solution1
0 2019-09-06 05:56:59

RFECV in sklearn, scores from grid_scores_

Question

1 answers

solution1 0 2019-09-06 05:56:59

solution1
0 2019-09-06 05:56:59