简体   繁体   English

验证RandomizedSearchCV结果的问题

[英]Issue Validating RandomizedSearchCV Results

I start with a basic Logistic Regression, using all defaults hyper-parameters. 我从基本的Logistic回归开始,使用所有默认的超参数。 And I get a score of 0.8855 我得到0.8855的分数

Question Next I run a RandomSearch to find the best hyper-parameters; 问题接下来,我运行RandomSearch来查找最佳的超参数。 According to the RandomSearch C=10 with Max_iterations=110 gives the score of 0.89 根据RandomSearch C = 10,Max_iterations = 110,得出的分数为0.89

I run the logistic with these hyper parameters but get a much better accuracy, 0.91 ! 我使用这些超级参数运行逻辑物流,但获得了更好的准确度0.91!

Why am I not getting exactly the same number? 为什么我的电话号码不完全相同?

You will definitely not get the same accuracy when you run it again in your train set, this is because when you do k-fold cross validation to check the performance of a particular set of hyper parameters you will divide the entire data into k sets and use k-1 sets for training and validate it on the left over one set. 在火车集中再次运行时,绝对不会获得相同的精度,这是因为当您进行k倍交叉验证以检查特定超参数集的性能时,会将整个数据分为k集合,使用k-1套训练,并在剩下的一套上进行验证。 And you repeat this process k times and each time you take a different set of data for validating. 然后,您会重复此过程k次,并且每次都使用一组不同的数据进行验证。 And finally you compute the average of all the k iterations and report your accuracy which is what you got in random_result.best_score_ , the figure below explains the process 最后,您计算所有k次迭代的平均值,并报告您在random_result.best_score_获得的random_result.best_score_ ,下图说明了该过程 在此处输入图片说明

And now after getting the best set of hyperparameters you will fit it on the entire training data ie set 1, set 2 and set 3, so now it is prone to have some variations since the data has changed and you are evaluating on the entire train data. 现在,在获得最佳的超参数集之后,您将其适合整个训练数据,即集合1,集合2和集合3,因此由于数据已更改并且您正在对整个火车进行评估,因此现在容易出现一些变化。数据。 So what you observe is totally normal and the usual behavior. 因此,您观察到的是完全正常和通常的行为。

Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM