简体繁体 English

从测试数据集中获得最佳准确性

[英]To get best accuracy from Testing dataset

原文 2019-11-27 14:17:51 7 1 python/ machine-learning/ random-forest

I am using Random forest classifier to classify data into 4 labels.我正在使用随机森林分类器将数据分类为 4 个标签。 There are a total of 20 features on which the model is being trained.训练模型的总共有 20 个特征。 I am observing an accuracy of around 45-47 % when Testing dataset is used.当使用测试数据集时，我观察到大约 45-47% 的准确度。 Although on prediction of Training dataset I'm getting an accuracy of 100%.尽管在训练数据集的预测中我得到了 100% 的准确率。 Also I'm using the best parameters extracted using Grid Search approach.我还使用了使用网格搜索方法提取的最佳参数。 Can anyone explain why such kind of biasing between training and testing prediction.谁能解释为什么训练和测试预测之间存在这种偏差。 How can I enhance the scenario?我怎样才能增强场景？

PS: I'm new to Machine Learning PS：我是机器学习的新手

1 个解决方案

There might be many Reasons 1)one might be, model is over fitting.可能有很多原因 1) 一个可能是，模型过拟合。 you can try doing hyper parameter optimization to find the optimal value where your model performs better.您可以尝试进行超参数优化以找到模型性能更好的最佳值。 2) Since you are using accuracy as a performance parameter, You can check whether data set is balanced or not. 2）由于您使用准确性作为性能参数，您可以检查数据集是否平衡。 If you are using imbalanced data set,you can use ROC,AUC Characteristics.如果你使用的是不平衡数据集，你可以使用 ROC，AUC 特征。