ROC Curve for Kfold in Scikit-learn. Works good for StratifiedKfold but show error for Kfold

Question

I want to plot ROC Curve using Kfold cross validation . However, the code that is given at scikit-learn page is for StratifiedKfold . Thus, when I change the StratifiedKfold by Kfold , it is giving me None results in plots. What can be the problem?

Sample code is at ScikitLearn

I changed the StratifiedKfold by Kfold but it is not working.

Answer 1

Shuffle the data before cutting the folds:

cv = KFold(n_splits=6, shuffle=True)

Explanation:

StratifiedKFold cuts folds such that proportion of classes in each fold is roughly the same as in the whole dataset. Kfold does not do that and just cuts folds from samples in the order of their appearance in the dataset. Hence you may or may not get all present classes in all folds. In this case or Iris dataset, the samples are sorted by class, this can be see from the target y :

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Hence in the case of 6 fold split, you only get one class in a fold most of the time, and the whole model training breaks. Shuffling allows KFold to pick up both classes in every fold, and all works fine.

ROC Curve for Kfold in Scikit-learn. Works good for StratifiedKfold but show error for Kfold

Question

1 answers

solution1
0 2019-01-27 21:21:31

ROC Curve for Kfold in Scikit-learn. Works good for StratifiedKfold but show error for Kfold

Question

1 answers

solution1 0 2019-01-27 21:21:31

solution1
0 2019-01-27 21:21:31