简体   繁体   中英

Decision tree- is it overfitting?

I am building a tree classifier and I would like to check and fix the possible overfitting. These are the calcuations:

dtc = DecisionTreeClassifier(max_depth=3,min_samples_split=3,min_samples_leaf=1, random_state=0)
dtc_fit = dtc.fit(X_train, y_train)

print("Accuracy using Decision Tree:" ,round(score, 1), "%")

('Accuracy using Decision Tree:', 92.2, '%')


scores = cross_val_score(dtc_fit, X_train, y_train, cv=5)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.91 (+/- 0.10)

What are the possible values I could fix to get a better result or perhaps these are already fine?

Thank you for the help, I am a beginner therefore unsure of the outcome.

Not sure exactly if it is overfitting or not, but you can give gridSearchCV a try for the following reasons

  • It will split your datasets into multiple combinations of different splits, hence you will get to know if the decision tree is overfitting on your training set or not (Although this might not neccessary be a valid way of knowing)
  • You can add various parameters by making a dictionary of various parameters and the values that they can have like this

     from sklearn.grid_search import GridSearchCV parameters_dict = {"max_depth": [2,5,6,10], "min_samples_split" : [0.1, 0.2, 0.3, 0.4], "min_samples_leaf" = [0.1, 0.2, 0.3, 0.4], "criterion": ["gini","entropy"]} dtc = DecisionTreeClassifier(random_state= 0) grid_obj = GridSearchCV(estimator=dtc,param_grid=parameters_dict, cv=10) grid_obj.fit(X_train,y_train) #Extract the best classifier best_clf = grid_obj.best_estimator_ 
  • Also you can try Recursive Feature Elimination with CV to find the best features. (This is an optional thing to do btw)

  • You can check other metrics like precision, recall, f1-score, etc. to get an idea if your decision tree is not overfitting the data (or is giving importance to one class over the others)

  • Also, as a side note just be sure that your data does not suffer from class imbalance problem.

This is not an exhaustive list and not necessarily the best ways to check overfitting but you can give it a try.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM