简体   繁体   English

获得100%的训练准确度,但获得60%的测试准确度

[英]Getting a 100% Training Accuracy, but 60% Testing accuracy

I am trying different classifiers with different parameters and stuff on a dataset provided to us as part of a course project. 我正在尝试使用作为课程项目一部分提供给我们的数据集上具有不同参数和内容的不同分类器。 We have to try and get the best performance on the dataset. 我们必须尝试在数据集上获得最佳性能。 The dataset is actually a reduced version of the online news popularity 数据集实际上是在线新闻受欢迎程度的简化版本

I have tried the SVM, Random Forest, SVM with cross-validation with k = 5 and they all seem to give approximately 100% training accuracy, while the testing accuracy is between 60-70. 我已经尝试过将SVM,随机森林,SVM与k = 5进行交叉验证,它们似乎都提供了大约100%的训练精度,而测试精度在60-70之间。 I think the testing accuracy is fine, but the training accuracy bothers me. 我认为测试的准确性还不错,但是培训的准确性困扰着我。 I would say maybe it was a case of overfitting data but none of my classmates seem to be getting similar results so maybe the problem is with my code. 我会说也许是数据过度拟合的情况,但是我的同学似乎都没有得到类似的结果,所以也许问题出在我的代码上。

Here is the code for my cross-validation and random forest classifier. 这是我的交叉验证和随机森林分类器的代码。 I would be very grateful if you help me find out why I am getting such a high Training accuracy 如果您能帮助我找出为什么我获得如此高的培训准确性,我将不胜感激

def crossValidation(X_train, X_test, y_train, y_test, numSplits):
    skf = StratifiedKFold(n_splits=5, shuffle=True)
    Cs = np.logspace(-3, 3, 10)
    gammas = np.logspace(-3, 3, 10)

    ACC = np.zeros((10, 10))
    DEV = np.zeros((10, 10))

    for i, gamma in enumerate(gammas):
        for j, C in enumerate(Cs):
            acc = []
            for train_index, dev_index in skf.split(X_train, y_train):
                X_cv_train, X_cv_dev = X_train[train_index], X_train[dev_index]
                y_cv_train, y_cv_dev = y_train[train_index], y_train[dev_index]
                clf = SVC(C=C, kernel='rbf', gamma=gamma, )
                clf.fit(X_cv_train, y_cv_train)
                acc.append(accuracy_score(y_cv_dev, clf.predict(X_cv_dev)))

            ACC[i, j] = np.mean(acc)
            DEV[i, j] = np.std(acc)

    i, j = np.argwhere(ACC == np.max(ACC))[0]
    clf1 = SVC(C=Cs[j], kernel='rbf', gamma=gammas[i], decision_function_shape='ovr')
    clf1.fit(X_train, y_train)
    y_predict_train = clf1.predict(X_train)
    y_pred_test = clf1.predict(X_test)
    print("Train Accuracy :: ", accuracy_score(y_train, y_predict_train))
    print("Test Accuracy  :: ", accuracy_score(y_test, y_pred_test))


def randomForestClassifier(X_train, X_test, y_train, y_test):
    """

    clf = RandomForestClassifier()
    clf.fit(X_train, y_train)
    y_predict_train = clf.predict(X_train)
    y_pred_test = clf.predict(X_test)
    print("Train Accuracy :: ", accuracy_score(y_train, y_predict_train))
    print("Test Accuracy  :: ", accuracy_score(y_test, y_pred_test))

There are two issues about the problem, training accuracy and testing accuracy are significantly different. 关于此问题有两个问题,训练准确性和测试准确性存在显着差异。

  1. Different distribution of training data and testing data.(because of selecting a part of the dataset) 训练数据和测试数据的分布不同(由于选择了数据集的一部分)
  2. Overfitting of the model to the training data. 模型对训练数据的过度拟合。

Since you apply cross-validation, it seems that you should think about another solution. 由于您应用了交叉验证,因此您似乎应该考虑另一种解决方案。 I recommend that you apply some feature selection or feature reduction (like PCA) approaches to tackle the overfitting problem. 我建议您应用某些特征选择或特征缩减(例如PCA)方法来解决过拟合问题。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 测试精度高于训练精度 - Testing accuracy higher than training accuracy 为什么在 Keras 中使用前馈神经网络进行单独的训练、验证和测试数据集可以获得 100% 的准确率? - Why am I getting 100% accuracy using feed-forward neural networks for separate training, validation, and testing datasets in Keras? 为什么我的模型在100%精度和60%精度之间转换? - Why do my models shift between 100% accuracy and 60% accuracy? Keras CNN:验证准确性停留在70%,培训准确性达到100% - Keras CNN: validation accuracy stuck at 70%, training accuracy reaching 100% 不同模型的训练精度不同但测试精度相同 - Different training accuracy for different models but same testing accuracy Kears LeNet 高训练和验证准确率但低测试准确率 - Kears LeNet High Training & Validation accuracy but Low Testing accuracy 为什么我的 KNeighborsRegressor 训练准确度下降而测试准确度增加? - Why is my KNeighborsRegressor training accuracy decreasing and testing accuracy increasing? 在训练期间接近 100% 的准确度,但在图像分类器的测试/验证期间 &lt;50% - Near 100% accuracy during training but <50% during testing/validation on image classifier Matplotlib 没有显示训练、测试损失/准确度曲线? - Matplotlib not showing training , testing loss/accuracy curves? 在我的 DecisionTree 模型上获得 100% 的准确性 - Getting 100% Accuracy on my DecisionTree Model
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM