I want to plot ROC Curve using Kfold cross validation . However, the code that is given at scikit-learn
page is for StratifiedKfold
. Thus, when I change the StratifiedKfold
by Kfold
, it is giving me None
results in plots. What can be the problem?
Sample code is at ScikitLearn
I changed the StratifiedKfold
by Kfold
but it is not working.
Shuffle the data before cutting the folds:
cv = KFold(n_splits=6, shuffle=True)
Explanation:
StratifiedKFold
cuts folds such that proportion of classes in each fold is roughly the same as in the whole dataset. Kfold
does not do that and just cuts folds from samples in the order of their appearance in the dataset. Hence you may or may not get all present classes in all folds. In this case or Iris
dataset, the samples are sorted by class, this can be see from the target y
:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Hence in the case of 6 fold split, you only get one class in a fold most of the time, and the whole model training breaks. Shuffling allows KFold to pick up both classes in every fold, and all works fine.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.