解釋 cross_val_score scikit_learn 參數 cv

Question

我不明白為什么我在這個 cross_val_score 的配置和一個簡單的模型中有不同的結果。

from sklearn.datasets import load_iris
from sklearn.utils import shuffle
from sklearn import tree
import numpy as np

np.random.seed(1234)
iris = load_iris()
X, y = iris.data, iris.target
X,y = shuffle(X,y)

print(y)
clf = tree.DecisionTreeClassifier(max_depth=2,class_weight={2: 0.3, 1: 10,0:0.3},random_state=1234)
clf2 = clf.fit(X, y)
tree.plot_tree(clf2)
from  sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
predi = clf2.predict(X)
cm =  confusion_matrix(y_true=y, y_pred=predi)
print(cm)
print("Accuracy = ",round(accuracy_score(y,predi)* 100.0,2))

from sklearn.model_selection import cross_val_score,cross_val_predict
max_id = len(X)
limit = round(max_id*0.6,0)
min_id=0
train = np.arange(0,limit)
test = np.arange(limit,max_id)
test = [int(x) for x in test]
train = [int(x) for x in train]
print(train)
print(test)
predi = cross_val_score(clf,X,y,cv=[(train,test)])
print(predi)
train = X[train[0]:train[-1]]
y_train =  y[train[0]:train[-1]]
Xtest = X[test[0]:test[-1]]
y_test =  y[test[0]:test[-1]]


clf3 = clf.fit(Xtrain,y_train)
predi = clf3.predict(Xtest)
cm =  confusion_matrix(y_true=y_test, y_pred=predi)
print(cm)
print("Accuracy = ",round(accuracy_score(y_test,predi)* 100.0,2))

我不明白為什么我有不同的准確性，而我在相同的火車測試樣本中具有相同的參數

Answer 1

基本上，您使用的數據拆分類型會對您的模型准確性產生影響。 這在機器學習領域有據可查。 其次，您的第一個模型有嚴格的偏差，因為您使用了訓練集進行測試，這將導致大約 100% 的准確度。

https://www.analyticsvidhya.com/blog/2021/05/4-ways-to-evaluate-your-machine-learning-model-cross-validation-techniques-with-python-code/

https://towardsdatascience.com/train-test-split-c3eed34f763b

解釋 cross_val_score scikit_learn 參數 cv

問題描述

1 個解決方案

解決方案1
0 2022-05-31 09:55:57

解釋 cross_val_score scikit_learn 參數 cv

問題描述

1 個解決方案

解決方案1 0 2022-05-31 09:55:57

解決方案1
0 2022-05-31 09:55:57