Is it possible to use the same k-folds in cross_val_predict that are in cross_val_score?

Question

Hi if we do the following to calculate cross validated accuracy:

cv_acc = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')

Is it possible to estimate the y predictions and create a confusion matrix (as below) using the same inputs to the k-folds in cross_val_score ?

y_pred = cross_val_predict(model, X_train, y_train, cv=5)
conf_mat = confusion_matrix(y_test, y_pred)

Is there a way to store how the k-folds are exactly split in cross_val_score to ensure a comparable confusion matrix?

Cheers :)

Answer 1

The following should work:

from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
k_folds = KFold(n_splits=5)
splits = list(k_folds.split(X_train, y_train)) # note list here as k_folds.split is a one-off generator
cv_acc = cross_val_score(model, X_train, y_train, cv=splits, scoring='accuracy')
y_pred = cross_val_predict(model, X_train, y_train, cv=splits)

See docs for cross_val_score and cross_val_predict for more.

A full working example:

from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

k_folds = KFold(n_splits=5)
X_train, y_train = make_classification(1000)
splits = list(k_folds.split(X_train, y_train))
model = LogisticRegression()
cv_acc = cross_val_score(model, X_train, y_train, cv=splits, scoring='accuracy')
y_pred = cross_val_predict(model, X_train, y_train, cv=splits)

Is it possible to use the same k-folds in cross_val_predict that are in cross_val_score?

Question

1 answers

solution1
1 ACCPTED 2020-09-14 10:39:26

Is it possible to use the same k-folds in cross_val_predict that are in cross_val_score?

Question

1 answers

solution1 1 ACCPTED 2020-09-14 10:39:26

solution1
1 ACCPTED 2020-09-14 10:39:26