Checking for Overfitting and Underfitting in sklearn models

Question

I am using the sklearn RandomForestClassifier as my classification. I could not figure out how to get evaluate Overfitting and Underfitting for sklearn models.

model = RandomForestClassifier(n_estimators=1000, random_state=1, criterion='entropy', bootstrap=True, oob_score=True, verbose=1)
model.fit(X_train, y_train)

Currently, I am using other metrics to evaluate my model like - cross_val_score, confusion_matrix, classification_report, PermutationImportance. Could someone please help me with this.

Answer 1

There are multiple ways you can test overfitting and underfitting. If you want to look specifically at train and test scores and compare them you can do this with sklearns cross_validate . If you read the documentation it will return you a dictionary with train scores (if supplied as train_score=True) and test scores in metrics that you supply.

sample code

model = RandomForestClassifier(n_estimators=1000, random_state=1, criterion='entropy', bootstrap=True, oob_score=True, verbose=1)
cv_dict = cross_validate(model, X, y, return_train_score=True)

You can also simply create a hold out test set with train test split and compare your training and test scores using the test data set.

Checking for Overfitting and Underfitting in sklearn models

Question

1 answers

solution1
2 ACCPTED 2020-02-09 14:47:46

Checking for Overfitting and Underfitting in sklearn models

Question

1 answers

solution1 2 ACCPTED 2020-02-09 14:47:46

solution1
2 ACCPTED 2020-02-09 14:47:46