简体   繁体   中英

Does it make sense to have negative epsilon when tuning a linear-SVM model in R?

I'm using the following tuning code to find the best case and epsilon for my svn model.

tuneResult <- tune(
    svm, 
    labels ~ ., 
    data = dataset, 
    ranges = list(epsilon = seq(-5.0, 5, 0.1), cost = 2^(0:3)))

But surprisingly it suggests cost = 4 and epsilon = -5 !

Then I trained the model using these parameters and tested with confusionMatrix . Unfortunately, the model is not as accurate as a model without these parameters.

model1 <-  svm(labels ~ ., data = dataset, kernel = "linear", cost = 4 , epsilon = -5)
model2 <-  svm(labels ~ ., data = dataset, kernel = "linear")

Am I missing something here?

tldr;

The issue is in your tuneResult command, where you allow epsilon to vary in the range [-5, +5] , which makes no sense as epsilon is defined for values >=0 . The fact that tuneResult returns epsilon = -5 suggests a convergence failure/issue when trying to find an optimal set of (hyper)parameters. Unfortunately, without (sample) data it is hard to get a feeling for any (potential) computational challenges in the classification model.


The role/interpretation of epsilon

Just to recap: In SVMs, epsilon describes the tolerance margin (the "insensitivity zone") within which classification errors are not penalised (you should take a look at ?e1071::svm to find out about the default value for epsilon ). In the limit of epsilon approaching zero from the right, all classification errors are penalised, resulting in a maximal number of support vectors (as a function of epsilon ). See eg here for a lot more details on the interpretation/definition of the various SVM (hyper)parameters.

Hyperparameter optimisation and convergence

Let's return to the question why the optimisation convergence failed: I think the issue arises from trying to simultaneously optimise both the cost and epsilon parameters. As epsilon gets smaller and smaller, you penalise misclassifications more and more (reducing the number of support vectors); at the same time , by allowing for greater and greater cost parameters you allow for more and more support vectors to be included to counter-balance misclassifications from small epsilon s. During cross-validation this essentially drives the model to smaller and smaller epsilon and larger and larger cost hyperparameters.

An example

We can reproduce this behaviour using some simulated data for an SVM classification problem.

  1. Let's generate some sample data

     # Sample data set.seed(1) x <- rbind(matrix(rnorm(10 * 2, mean = 0), ncol = 2), matrix(rnorm(10 * 2, mean = 2), ncol = 2)) y <- c(rep(-1, 10), rep(1, 10)) df <- data.frame(x = x, y = as.factor(y)) 
  2. Let's simultaneously tune the epsilon and cost hyperparameters. We use the same ranges as in your original post, including the nonsensical (ie negative) epsilon values.

     # tune epsilon and cost hyper-parameters library(caret) tuneResult <- tune( svm, y ~ ., data = df, ranges = list(epsilon = seq(-5, 5, 0.01), cost = 2^(0:3)) ) # #Parameter tuning of 'svm': # #- sampling method: 10-fold cross validation # #- best parameters: # epsilon cost # -5 4 # #- best performance: 0.1 

    You can see how the epsilon and cost parameters tend to their respective minimal/maximal extremes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM