简体   繁体   中英

Explication cross_val_score scikit_learn parameter cv

I don't understand why i have different result in this configuration of cross_val_score and a simple model.

from sklearn.datasets import load_iris
from sklearn.utils import shuffle
from sklearn import tree
import numpy as np

np.random.seed(1234)
iris = load_iris()
X, y = iris.data, iris.target
X,y = shuffle(X,y)

print(y)
clf = tree.DecisionTreeClassifier(max_depth=2,class_weight={2: 0.3, 1: 10,0:0.3},random_state=1234)
clf2 = clf.fit(X, y)
tree.plot_tree(clf2)
from  sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
predi = clf2.predict(X)
cm =  confusion_matrix(y_true=y, y_pred=predi)
print(cm)
print("Accuracy = ",round(accuracy_score(y,predi)* 100.0,2))

from sklearn.model_selection import cross_val_score,cross_val_predict
max_id = len(X)
limit = round(max_id*0.6,0)
min_id=0
train = np.arange(0,limit)
test = np.arange(limit,max_id)
test = [int(x) for x in test]
train = [int(x) for x in train]
print(train)
print(test)
predi = cross_val_score(clf,X,y,cv=[(train,test)])
print(predi)
train = X[train[0]:train[-1]]
y_train =  y[train[0]:train[-1]]
Xtest = X[test[0]:test[-1]]
y_test =  y[test[0]:test[-1]]


clf3 = clf.fit(Xtrain,y_train)
predi = clf3.predict(Xtest)
cm =  confusion_matrix(y_true=y_test, y_pred=predi)
print(cm)
print("Accuracy = ",round(accuracy_score(y_test,predi)* 100.0,2))

I don't understand why i have different accuracy whereas i have the same parameters en the same train test sample

Basically, the kind of data split you use will have an impact on your model accuracy. This is well documented in machine learning field. Secondly, your first model is strictly biased as you have used your training set for testing which will result in ~100% accuracy.

https://www.analyticsvidhya.com/blog/2021/05/4-ways-to-evaluate-your-machine-learning-model-cross-validation-techniques-with-python-code/

https://towardsdatascience.com/train-test-split-c3eed34f763b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM