简体   繁体   English

我的 model 是否应该始终在训练数据集上提供 100% 的准确度?

[英]Should my model always give 100% accuracy on Training dataset?

from sklearn.naive_bayes import MultinomialNB # Multinomial Naive Bayes on Lemmatized Text

X_train, X_test, y_train, y_test = train_test_split(df['Rejoined_Lemmatize'], df['Product'], random_state = 0)

X_train_counts = tfidf.fit_transform(X_train)
clf = MultinomialNB().fit(X_train_counts, y_train)
y_temp = clf.predict(tfidf.transform(X_train))

I am testing my model on the training dataset itself.我正在训练数据集本身上测试我的 model。 It is giving me the following results:它给了我以下结果:

                          precision    recall  f1-score   support

               accuracy                           0.92    742500
              macro avg       0.93      0.92      0.92    742500
           weighted avg       0.93      0.92      0.92    742500

Is it acceptable to get accuracy< 100% on the training dataset?在训练数据集上获得 < 100% 的准确率是否可以接受?

Nope, you shouldnot get 100% accuracy from your training dataset.不,您不应该从训练数据集中获得 100% 的准确率。 If it does, it could mean that your model is overfitting.如果是这样,则可能意味着您的 model 过拟合。

TL:DR: yes it is accetable to have better performances on the testing dataset TL:DR: 是的,在测试数据集上有更好的表现是可以接受的

The most important question in classification (supervised learning) is that of generalization, that is to say the performances in production (or on the testing dataset).分类(监督学习)中最重要的问题是泛化问题,即生产(或测试数据集)中的性能。 Actually, the performances on your learning dataset do not matter since it is only used to learn your model.实际上,您的学习数据集的性能并不重要,因为它仅用于学习您的 model。 Once it is done, you will never use it, and the performances on only data that has not been seen during learning will be submited to the model.一旦完成,您将永远不会使用它,并且只会将在学习过程中没有看到的数据上的表现提交给 model。

A statistical model that is complex enough (that has enough capacity ) can perfectly fit to any learning dataset and obtain 100% accuracy on it.足够复杂(具有足够容量)的统计 model 可以完美地拟合任何学习数据集并获得 100% 的准确率。 But by fitting perfectly to the training set, it will have poor performance on new data that are not seen during training ( overfitting ).但是通过完美地拟合训练集,它将在训练期间看不到的新数据上表现不佳(过度拟合)。 Hence, it's not what interests you.因此,这不是你感兴趣的。 Hence, you can accept to reduce the performances on the training dataset in order to better generalize, that is to say to get better performance on data that are not used during learning.因此,您可以接受降低训练数据集的性能以更好地泛化,即在学习期间未使用的数据上获得更好的性能。 This is named regularization .这称为正则化

In your case, I am nevertheless not sure that MultinomialNB allows to control the regularization.在您的情况下,我仍然不确定MultinomialNB是否允许控制正则化。 You should try other classifiers of sklearn such as proposed here .您应该尝试其他的 sklearn 分类器,例如这里提出的。

I think it is better use the cross-validation result to see an accurate estimation of your accuracy.我认为最好使用交叉验证结果来准确估计您的准确性。 Cross-validation is taken to be an efficient way to avoid overfitting.交叉验证被认为是避免过度拟合的有效方法。

from sklearn.model_selection import cross_val_score

scores = cross_val_score(clf, X_train, y_train, cv=10) 

And, you can report mean-score value: scores.mean() .而且,您可以报告平均分值: scores.mean()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我的 ResNet 迁移学习模型在训练时始终保持 0.5 准确度 - 怎么了? - My ResNet transfer learning model is always stays 0.5 accuracy when training - What's wrong? 为什么我的 XBGoost model 对训练和测试数据集有很好的准确性,但在预测保留数据集时却很差? - Why does my XBGoost model have a good accuracy for training and testing dataset, but poor one for predicting an held out dataset? 在我的 DecisionTree 模型上获得 100% 的准确性 - Getting 100% Accuracy on my DecisionTree Model Keras 模型不是训练层,验证准确率始终为 0.5 - Keras model not training layers, validation accuracy always 0.5 获得100%的训练准确度,但获得60%的测试准确度 - Getting a 100% Training Accuracy, but 60% Testing accuracy 我的神经网络 model 准确率始终为 50% - my neural network model accuracy is always 50% 无论我的训练集有多小,测试准确度始终很高 - Test accuracy always high regardless of how small my training set is 我的模型的训练/验证准确性表现很奇怪 - my model's Training/Validation accuracy behave strange Keras 深度学习模型在训练中总是给出相同的 acc - Keras deep learning model always give same acc in training 如何使用 Tensorboard 检查训练模型的准确性? - How do I check accuracy of my training model using Tensorboard?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM