[英]Why is cross_val_score different to when I calculate it manually?
Here is the reproducible example code:这是可重现的示例代码:
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_validate
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import balanced_accuracy_score
# define dataset
X, y = make_classification(n_samples=1000, weights = [0.3,0.7], n_features=100, n_informative=75, random_state=0)
# define the model
model = RandomForestClassifier(n_estimators=10, random_state=0)
# evaluate the model
n_splits=10
cv = StratifiedShuffleSplit(n_splits, random_state=0)
n_scores = cross_validate(model, X, y, scoring='balanced_accuracy', cv=cv, n_jobs=-1, error_score='raise')
# report performance
print('Accuracy: %0.4f' % (mean(n_scores['test_score'])))
bal_acc_sum = []
for train_index, test_index in cv.split(X,y):
model.fit(X[train_index], y[train_index])
bal_acc_sum.append(balanced_accuracy_score(model.predict(X[test_index]),y[test_index]))
print(f"Accuracy: %0.4f" % (mean(bal_acc_sum)))
Result:结果:
Accuracy: 0.6737
Accuracy: 0.7113
The results for my self calculated accuracy is always higher than the one cross-validation gives me.我自己计算的准确性的结果总是高于交叉验证给我的结果。 But it should be the same or am I missing something?
但它应该是一样的还是我错过了什么? Same metric, same split (KFold brings same result), same fixed model (other models behave identically), same random state, but different results?
相同的度量,相同的拆分(KFold 带来相同的结果),相同的固定 model(其他型号表现相同),相同的随机 state,但结果不同?
It is because, in your manual calculation, you have flipped the order of arguments in balanced_accuracy_score
, which matters - it should be (y_true, y_pred)
( docs ).这是因为,在您的手动计算中,您已经翻转了 balance_accuracy_score 中
balanced_accuracy_score
的顺序,这很重要 - 它应该是(y_true, y_pred)
( docs )。
Changing this, your manual calculation becomes:更改此设置,您的手动计算将变为:
bal_acc_sum = []
for train_index, test_index in cv.split(X,y):
model.fit(X[train_index], y[train_index])
bal_acc_sum.append(balanced_accuracy_score(y[test_index], model.predict(X[test_index]))) # change order of arguments here
print(f"Accuracy: %0.4f" % (mean(bal_acc_sum)))
Result:结果:
Accuracy: 0.6737
And和
import numpy as np
np.all(bal_acc_sum==n_scores['test_score'])
# True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.