简体   繁体   中英

Checking for Overfitting and Underfitting in sklearn models

I am using the sklearn RandomForestClassifier as my classification. I could not figure out how to get evaluate Overfitting and Underfitting for sklearn models.

model = RandomForestClassifier(n_estimators=1000, random_state=1, criterion='entropy', bootstrap=True, oob_score=True, verbose=1)
model.fit(X_train, y_train)

Currently, I am using other metrics to evaluate my model like - cross_val_score, confusion_matrix, classification_report, PermutationImportance. Could someone please help me with this.

There are multiple ways you can test overfitting and underfitting. If you want to look specifically at train and test scores and compare them you can do this with sklearns cross_validate . If you read the documentation it will return you a dictionary with train scores (if supplied as train_score=True) and test scores in metrics that you supply.

sample code

model = RandomForestClassifier(n_estimators=1000, random_state=1, criterion='entropy', bootstrap=True, oob_score=True, verbose=1)
cv_dict = cross_validate(model, X, y, return_train_score=True)

You can also simply create a hold out test set with train test split and compare your training and test scores using the test data set.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM