简体   繁体   English

为什么 cross_val_score 与我手动计算时不同?

[英]Why is cross_val_score different to when I calculate it manually?

Here is the reproducible example code:这是可重现的示例代码:

from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_validate
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import balanced_accuracy_score

# define dataset
X, y = make_classification(n_samples=1000, weights = [0.3,0.7], n_features=100, n_informative=75, random_state=0)
# define the model
model = RandomForestClassifier(n_estimators=10, random_state=0)
# evaluate the model
n_splits=10
cv = StratifiedShuffleSplit(n_splits, random_state=0)
n_scores = cross_validate(model, X, y, scoring='balanced_accuracy', cv=cv, n_jobs=-1, error_score='raise')
# report performance
print('Accuracy: %0.4f' % (mean(n_scores['test_score'])))

bal_acc_sum = []
for train_index, test_index in cv.split(X,y):
    model.fit(X[train_index], y[train_index])                                      
    bal_acc_sum.append(balanced_accuracy_score(model.predict(X[test_index]),y[test_index]))

print(f"Accuracy: %0.4f" % (mean(bal_acc_sum)))

Result:结果:

Accuracy: 0.6737
Accuracy: 0.7113

The results for my self calculated accuracy is always higher than the one cross-validation gives me.我自己计算的准确性的结果总是高于交叉验证给我的结果。 But it should be the same or am I missing something?但它应该是一样的还是我错过了什么? Same metric, same split (KFold brings same result), same fixed model (other models behave identically), same random state, but different results?相同的度量,相同的拆分(KFold 带来相同的结果),相同的固定 model(其他型号表现相同),相同的随机 state,但结果不同?

It is because, in your manual calculation, you have flipped the order of arguments in balanced_accuracy_score , which matters - it should be (y_true, y_pred) ( docs ).这是因为,在您的手动计算中,您已经翻转了 balance_accuracy_score 中balanced_accuracy_score的顺序,这很重要 - 它应该是(y_true, y_pred) ( docs )。

Changing this, your manual calculation becomes:更改此设置,您的手动计算将变为:

bal_acc_sum = []
for train_index, test_index in cv.split(X,y):
    model.fit(X[train_index], y[train_index])                                      
    bal_acc_sum.append(balanced_accuracy_score(y[test_index], model.predict(X[test_index])))  # change order of arguments here

print(f"Accuracy: %0.4f" % (mean(bal_acc_sum)))

Result:结果:

Accuracy: 0.6737

And

import numpy as np
np.all(bal_acc_sum==n_scores['test_score'])
# True

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 即使我事先设置了随机状态,为什么我的 cross_val_score 总是不同? - Why is my cross_val_score always different even when I have set my random state beforehand? 我从GridSearchCV获得的R ^ 2得分与我从cross_val_score得到的得分非常不同,为什么? (sklearn,python) - The R^2 score I get from GridSearchCV is very different from the one I get from cross_val_score, why? (sklearn, python) 使用 cross_val_predict 与 cross_val_score 时,scikit-learn 分数不同 - scikit-learn scores are different when using cross_val_predict vs cross_val_score 为什么分类器的score函数与sklearn中的cross_val_score函数返回的结果完全不同? - Why the classifier's score function return a quite different result from cross_val_score function in sklearn? 训练玩具 Xgboost model 时手动复制 cross_val_score 会导致奇怪的结果 - Manually replicating cross_val_score leads to strange resutls when training a toy Xgboost model 为什么train_test_split和管道cross_val_score之间的r2_score有很大不同? - why r2_score is quite different between train_test_split and pipeline cross_val_score? Scikit:使用cross_val_score函数计算精度和召回率 - Scikit: calculate precision and recall using cross_val_score function 射线 + cross_val_score - Ray + cross_val_score cross_val_score 和 StratifiedKFold 给出不同的结果 - cross_val_score and StratifiedKFold give different result cross_val_score 与 sklearn 中的不同分类器的行为不同 - cross_val_score behaves differently with different classifiers in sklearn
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM