[英]Does it make sense to have negative epsilon when tuning a linear-SVM model in R?
I'm using the following tuning code to find the best case and epsilon for my svn model. 我正在使用以下调整代码来为我的svn模型找到最佳情况和epsilon。
tuneResult <- tune(
svm,
labels ~ .,
data = dataset,
ranges = list(epsilon = seq(-5.0, 5, 0.1), cost = 2^(0:3)))
But surprisingly it suggests cost = 4
and epsilon = -5
! 但是令人惊讶的是,它表明
cost = 4
, epsilon = -5
!
Then I trained the model using these parameters and tested with confusionMatrix
. 然后,我使用这些参数训练了模型,并使用
confusionMatrix
进行了测试。 Unfortunately, the model is not as accurate as a model without these parameters. 不幸的是,该模型不如没有这些参数的模型准确。
model1 <- svm(labels ~ ., data = dataset, kernel = "linear", cost = 4 , epsilon = -5)
model2 <- svm(labels ~ ., data = dataset, kernel = "linear")
Am I missing something here? 我在这里想念什么吗?
tldr; tldr;
The issue is in your tuneResult
command, where you allow epsilon
to vary in the range [-5, +5]
, which makes no sense as epsilon
is defined for values >=0
. 问题出在您的
tuneResult
命令中,您在其中允许epsilon
在[-5, +5]
范围内变化,因为为值>=0
定义epsilon
没有意义。 The fact that tuneResult
returns epsilon = -5
suggests a convergence failure/issue when trying to find an optimal set of (hyper)parameters. tuneResult
返回epsilon = -5
的事实表明,在尝试找到最佳的(超)参数集时会发生收敛失败/问题。 Unfortunately, without (sample) data it is hard to get a feeling for any (potential) computational challenges in the classification model. 不幸的是,没有(样本)数据,很难对分类模型中的任何(潜在)计算挑战有所了解。
The role/interpretation of epsilon
epsilon
的作用/解释
Just to recap: In SVMs, epsilon
describes the tolerance margin (the "insensitivity zone") within which classification errors are not penalised (you should take a look at ?e1071::svm
to find out about the default value for epsilon
). 简要说明一下:在SVM中,
epsilon
描述了容差裕度(“不敏感区域”),在该容限内不会对分类错误进行惩罚(您应查看?e1071::svm
来了解epsilon
的默认值)。 In the limit of epsilon
approaching zero from the right, all classification errors are penalised, resulting in a maximal number of support vectors (as a function of epsilon
). 在
epsilon
从右向零逼近的极限中, 所有分类错误都会受到惩罚,从而导致支持向量数量最多(作为epsilon
的函数)。 See eg here for a lot more details on the interpretation/definition of the various SVM (hyper)parameters. 有关各种SVM(超级)参数的解释/定义,请参见此处的更多详细信息。
Hyperparameter optimisation and convergence 超参数优化和收敛
Let's return to the question why the optimisation convergence failed: I think the issue arises from trying to simultaneously optimise both the cost
and epsilon
parameters. 让我们回到为什么优化收敛失败的问题:我认为这个问题源于同时优化
cost
和epsilon
参数的问题。 As epsilon
gets smaller and smaller, you penalise misclassifications more and more (reducing the number of support vectors); 随着
epsilon
变得越来越小,对错误分类的惩罚越来越多(减少支持向量的数量)。 at the same time , by allowing for greater and greater cost
parameters you allow for more and more support vectors to be included to counter-balance misclassifications from small epsilon
s. 同时 ,通过允许越来越多的
cost
参数,您可以包含越来越多的支持向量,以抵消小epsilon
的误分类。 During cross-validation this essentially drives the model to smaller and smaller epsilon
and larger and larger cost
hyperparameters. 在交叉验证期间,这实质上将模型驱动到越来越小的
epsilon
和越来越大的cost
超参数。
An example 一个例子
We can reproduce this behaviour using some simulated data for an SVM classification problem. 我们可以使用一些用于SVM分类问题的模拟数据来重现此行为。
Let's generate some sample data 让我们生成一些样本数据
# Sample data set.seed(1) x <- rbind(matrix(rnorm(10 * 2, mean = 0), ncol = 2), matrix(rnorm(10 * 2, mean = 2), ncol = 2)) y <- c(rep(-1, 10), rep(1, 10)) df <- data.frame(x = x, y = as.factor(y))
Let's simultaneously tune the epsilon and cost hyperparameters. 让我们同时调整epsilon和成本超参数。 We use the same ranges as in your original post, including the nonsensical (ie negative) epsilon values.
我们使用与您原始帖子相同的范围,包括无意义(即负)的ε值。
# tune epsilon and cost hyper-parameters library(caret) tuneResult <- tune( svm, y ~ ., data = df, ranges = list(epsilon = seq(-5, 5, 0.01), cost = 2^(0:3)) ) # #Parameter tuning of 'svm': # #- sampling method: 10-fold cross validation # #- best parameters: # epsilon cost # -5 4 # #- best performance: 0.1
You can see how the epsilon
and cost
parameters tend to their respective minimal/maximal extremes. 您会看到
epsilon
和cost
参数如何趋向于各自的最小/最大极限。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.