简体   繁体   中英

Underfitting, Overfitting, Good_Generalization

So as a part of my assignment I'm applying linear and lasso regressions, and here's Question 7.

Based on the scores from question 6, what gamma value corresponds to a model that is underfitting (and has the worst test set accuracy)? What gamma value corresponds to a model that is overfitting (and has the worst test set accuracy)? What choice of gamma would be the best choice for a model with good generalization performance on this dataset (high accuracy on both training and test set)?

Hint: Try plotting the scores from question 6 to visualize the relationship between gamma and accuracy. Remember to comment out the import matplotlib line before submission.

This function should return one tuple with the degree values in this order: (Underfitting, Overfitting, Good_Generalization) Please note there is only one correct solution.

I really need help, I can't really think of any way to solve this last question. What code should I use to determine (Underfitting, Overfitting, Good_Generalization) and why???

Thanks,

Data set:http://archive.ics.uci.edu/ml/datasets/Mushroom?ref=datanews.io

Here's my code from question 6:

from sklearn.svm import SVC
from sklearn.model_selection import validation_curve

def answer_six():
    # SVC requires kernel='rbf', C=1, random_state=0 as instructed
    # C: Penalty parameter C of the error term
    # random_state: The seed of the pseudo random number generator 
    # used when shuffling the data for probability estimates
    # e radial basis function kernel, or RBF kernel, is a popular 
    # kernel function used in various kernelized learning algorithms, 
    # In particular, it is commonly used in support vector machine 
    # classification

    model = SVC(kernel='rbf', C=1, random_state=0)

    # Return numpy array numbers spaced evenly on a log scale (start, 
    # stop, num=50, endpoint=True, base=10.0, dtype=None, axis=0)

    gamma = np.logspace(-4,1,6)

    # Create a Validation Curve for model and subsets.
    # Create parameter name and range regarding gamma. Test Scoring 
    # requires accuracy. 
    # Validation curve requires X and y.

    train_scores, test_scores = validation_curve(model, X_subset, y_subset, param_name='gamma', param_range=gamma, scoring ='accuracy')

    # Determine mean for scores and tests along columns (axis=1)
    sc = (train_scores.mean(axis=1), test_scores.mean(axis=1))                                                 

    return sc

answer_six() 

Well, make yourself familiar with overfitting. You are supposed to produce something like this: Article on this topic图形

On the left you have underfitting, on the right overfitting... Where both errors are low you have good generalisation.

And these things are a function of gamma (the regularizor)

Overfitting = your model false if model false scatter it change linear to poly or suport vector with working kernel... Underfitting = your dataset false add new data ideal correleated ...

check by nubers score / accuracy of test and train if test and train high and no big difference you are doiing good ... if test low or train low then you facing overfitting / underfitting

hope explained you ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM