简体   繁体   中英

Is it possible to use the same k-folds in cross_val_predict that are in cross_val_score?

Hi if we do the following to calculate cross validated accuracy:

cv_acc = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')    

Is it possible to estimate the y predictions and create a confusion matrix (as below) using the same inputs to the k-folds in cross_val_score ?

y_pred = cross_val_predict(model, X_train, y_train, cv=5)
conf_mat = confusion_matrix(y_test, y_pred)

Is there a way to store how the k-folds are exactly split in cross_val_score to ensure a comparable confusion matrix?

Cheers :)

The following should work:

from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
k_folds = KFold(n_splits=5)
splits = list(k_folds.split(X_train, y_train)) # note list here as k_folds.split is a one-off generator
cv_acc = cross_val_score(model, X_train, y_train, cv=splits, scoring='accuracy')
y_pred = cross_val_predict(model, X_train, y_train, cv=splits)

See docs for cross_val_score and cross_val_predict for more.

A full working example:

from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

k_folds = KFold(n_splits=5)
X_train, y_train = make_classification(1000)
splits = list(k_folds.split(X_train, y_train))
model = LogisticRegression()
cv_acc = cross_val_score(model, X_train, y_train, cv=splits, scoring='accuracy')
y_pred = cross_val_predict(model, X_train, y_train, cv=splits)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM