简体   繁体   中英

How to choose C and gamma AFTER grid search using libSVM (RBF kernel) for best possible generalisation?

I am aware of the abundance of questions asking about choosing the 'best' C and gamma values for SVM (RBF kernel). The standard answer is a grid search, however, my questions starts after the results of the grid search. Let me explain:

I have a data set of 10 subjects on which I perform leave-one-subject-out-xfold-validation meaning I perform a grid search on each left-out subject. In order to not optimise on this training data I do not want to choose the best C and gamma parameter by building the mean accuracy over all 10 models and search for the maximum. Considering one model within the xfold, I could perform another xfold only on the training data wihtin this model (not involving the left out validation subject). But you can imagine the computational effort and I do not have enough time atm for this.

Since the grid search for each of the 10 models resulted in a wide range of good C and gamma parameters (difference between accuracy of only 2-4%, see Figure 1) I thought about a different way.

I defined a region within the grid, which only contains the accuracies that have a difference of 2% to the maximum accuracy of this grid. All other accuracy values with a difference higher than 2% are set to zero (see Figure 2). I do this for every model and build the intersect between the regions of every model. This results in a much smaller region of C and gamma values that would produce accuracies within 2% of the max. accuracy for each model. However, the range is still rather big. So I thought about choosing the C-gamma pair with the lowest C as this would mean that I am the furthest away from overfitting and closer to a good generalisation. Can I argue like that?


How would I generally choose a C and gamma within this region of C-gamma pairs, which all proofed to be reliable adjustments for my classifier in all 10 models? Should I focus on minimising the C parameter? Or should I focus on minimising the C AND the gamma paramater?


I found a related answer here ( Are high values for c or gamma problematic when using an RBF kernel SVM? ) that says a combination of high C AND high gamma would mean overfitting. I understood that the value of gamma changes the width of the gaussian curve around data points, but I still cant get my head around what it practically means within a data set.

The post brought me to another idea. Could I use the number of SVs related to the number of data points as a criterium to choose between all the C-gamma pairs? A low (number of SVs/number of data points) would mean a better generalisation? I am willing to loose accuracy as it shouldnt effect the outcome I am interested in, if I get in return a better generalisation (at least from a theoretical point of view).

网格搜索后的平衡公告

遵循我的地区并与标准相交的均衡公告

Since linear kernel is a special case of rbf kernel. There is a method using linear SVM to tune C first. And bilinear tuning CG pair later to save time.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.880&rep=rep1&type=pdf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM