[英]GridSearchCV does not improve my test accuracy
I am making multiple classifier models and the test accuracy for all of them is 0.508.我正在制作多个分类器模型,所有这些模型的测试精度都是 0.508。
I find it weird that multiple models have the same accuracy.我发现多个模型具有相同的准确度很奇怪。 The models I used are Logistic Regressor,DesicionTreeClassifier, MLPClassifier, RandomForestClassifier, BaggingClassifier, AdaBoostClassifier, XGBClassifier, SVC, and VotingClassifier.我使用的模型是 Logistic Regressor、DesicionTreeClassifier、MLPClassifier、RandomForestClassifier、BaggingClassifier、AdaBoostClassifier、XGBClassifier、SVC 和 VotingClassifier。
After using GridSearchCV to improve the models, all of their test accuracy scores improved.在使用 GridSearchCV 改进模型后,他们所有的测试准确率分数都有所提高。 But the test accuracy scores did not change.但测试准确度分数没有改变。
I wish I could say I changed something, but I don't know why the test scores did not change.我希望我能说我改变了什么,但我不知道为什么考试成绩没有改变。 After using gridsearch, I expected the test scores to improve but it didn't使用 gridsearch 后,我期望考试成绩会有所提高,但没有
I would like to confirm, you mean your training scores improve but you testing scores did not change?我想确认一下,你的意思是你的训练分数提高了但你的测试分数没有变化? If yes, there are a lot of possibility behind this.如果是的话,这背后有很多可能性。
Looking at your accuracy, first of all I would say: are you performing a binary classification task?看看你的准确性,首先我会说:你在执行二元分类任务吗? Because if it is the case, your models are almost not better than random on the test set, which may suggest that something is wrong with your training.因为如果是这样,你的模型在测试集上几乎不比随机好,这可能表明你的训练有问题。
Otherwise, GridSearchCV
, like RandomSearchCV
and other hyperparameters optimization techniques try to find optimal parameters among a range that you define .否则, GridSearchCV
与RandomSearchCV
和其他超参数优化技术一样会尝试在您定义的范围内找到最佳参数。 If, after optimization, your optimal parameter has the value of one bound of your range, it may suggest that you need to explore beyond this bound, that is to say set another range on purpose and run the optimization again.如果在优化之后,你的最优参数值是你范围的一个界限,这可能表明你需要探索超出这个界限,也就是说故意设置另一个范围并再次运行优化。
By the way, I don't know the size of your dataset but if it is big I would recommend you to use RandomSearchCV
instead of GridSearchCV
.顺便说一下,我不知道你的数据集的大小,但如果它很大,我会建议你使用RandomSearchCV
而不是GridSearchCV
。 As it is not exhaustive, it takes less time and gives results that are (nearly) optimized.由于它不是详尽无遗的,因此它花费的时间更少,并且给出的结果(几乎)是优化的。
There can be several reasons why the test accuracy didn't change after using GridSearchCV:使用 GridSearchCV 后测试准确性没有改变的原因可能有多种:
The best parameters found by GridSearchCV might not be optimal for the test data. GridSearchCV 找到的最佳参数可能不是测试数据的最佳参数。
The test data may have a different distribution than the training data, leading to low test accuracy.测试数据可能与训练数据有不同的分布,导致测试准确率低。
The models might be overfitting to the training data and not generalizing well to the test data.这些模型可能对训练数据过度拟合,而不能很好地泛化到测试数据。
The test data size might be small, leading to high variance in test accuracy scores.测试数据量可能很小,导致测试准确度分数的差异很大。
The problem itself might be challenging, and a test accuracy of 0.508 might be the best that can be achieved with the current models and data.问题本身可能具有挑战性,0.508 的测试精度可能是当前模型和数据所能达到的最佳精度。
It would be useful to have more information about the data, the problem, and the experimental setup to diagnose the issue further.获得有关数据、问题和实验设置的更多信息以进一步诊断问题会很有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.