Hi if we do the following to calculate cross validated accuracy:
cv_acc = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
Is it possible to estimate the y predictions and create a confusion matrix (as below) using the same inputs to the k-folds in cross_val_score
?
y_pred = cross_val_predict(model, X_train, y_train, cv=5)
conf_mat = confusion_matrix(y_test, y_pred)
Is there a way to store how the k-folds are exactly split in cross_val_score
to ensure a comparable confusion matrix?
Cheers :)
The following should work:
from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
k_folds = KFold(n_splits=5)
splits = list(k_folds.split(X_train, y_train)) # note list here as k_folds.split is a one-off generator
cv_acc = cross_val_score(model, X_train, y_train, cv=splits, scoring='accuracy')
y_pred = cross_val_predict(model, X_train, y_train, cv=splits)
See docs for cross_val_score and cross_val_predict for more.
A full working example:
from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
k_folds = KFold(n_splits=5)
X_train, y_train = make_classification(1000)
splits = list(k_folds.split(X_train, y_train))
model = LogisticRegression()
cv_acc = cross_val_score(model, X_train, y_train, cv=splits, scoring='accuracy')
y_pred = cross_val_predict(model, X_train, y_train, cv=splits)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.