简体   繁体   中英

How to recognize Overfitting and underfitting in Python

I have a regression model. I write code of this algorithm :

create 10 random splits of training data into training and validation data. Choose the best value of alpha from the following set: {0.1, 1, 3, 10, 33, 100, 333, 1000, 3333, 10000, 33333}.

To choose the best alpha hyperparameter value, you have to do the following:

• For each value of hyperparameter, perform 10 random splits of training data into training and validation data as said above.

• For each value of hyperparameter, use its 10 random splits and find the average training and validation accuracy.

• On a graph, plot both the average training accuracy (in red) and average validation accuracy (in blue) wrt each hyperparameter setting. Comment on this graph by identifying regions of overfitting and underfitting.

• Print the best value of alpha hyperparameter.

2- Evaluate the prediction performance on test data and report the following: • Total number of non-zero features in the final model. • The confusion matrix • Precision, recall and accuracy for each class.

Finally, discuss if there is any sign of underfitting or overfitting with appropriate reasoning

I write This code :

print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(Newclassifier.score(X_test, y_test)))
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))        

My Questions is : 1- why accuracy in each iteration decrease? 2- is My model Overfit or underfit? 3- does My model work right?

There is no official/absolute metric for deciding whether you are underfitting, overfitting of neither. In practice

  • underfitting: you model is too simple. There will be no much difference between train and validation set, but the accuracy will be pretty low on them
  • overfitting: you model is too complicated. Instead of learning the underlying patterns, it memorizes you training set . So, the training error will decrease, but the validation error will start increasing after some point

In you case, your training and testing error seem to go in parallel, so you don't seem to have a problem with overfitting. Your model could be underfitting, so you could try with a more complex model. However, it is possible that this is how good this algorithm can get at this particular training set. In most real problems, no algorithm can get to zero error.

As to why your error increases, I don't know how this particular algorithm works, but since it seems to rely on random methods, it seems reasonable behavior. It goes a bit up and down, but it does not steadily increase, so it doesn't seem problematic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM