简体   繁体   English

使用libsvm来了解SVM问题

[英]using libsvm to understand SVM questions

Here I post two figures as well as some questions with them, can anyone help me to solve them, cause I really get stuck in it? 在这里,我张贴了两个数字以及一些问题,有人可以帮助我解决它们,因为我真的陷入其中了吗?

1. Training a simple linear SVM:
% svm-train -t 0 -c 100 data0 data0.model
% python drawBoundary.py data0
where: 
-t 0    -- use a linear kernel
-c 100  -- set "C" = 100, which means "overfit a lot"

Then here comes the resulting plot: enter image description here 然后是结果图: 在此处输入图像描述

This is an easily separable dataset, which is reflected by the small number of support vectors. 这是一个易于分离的数据集,这由少量支持向量反映出来。 In the plot, the SVs are drawn big (and are on the margin, the dashed line one unit away from the decision boundary, the solid line). 在图中,SV被绘制得很大(并且在边距上,虚线与决策边界相距一个单位,实线)。

Then here comes the first question: 然后是第一个问题:

Q1: You should have found that it takes 3 support vectors. Could you have fewer (eg., 2) support vectors here? And why?

My answer is NO, 3 is the smallest number. 我的答案是“否”,3是最小的数字。 But that's only my intuition. 但这只是我的直觉。 I don't know why. 我不知道为什么 Could somebody explain the reason to me? 有人可以向我解释原因吗?

2.
% svm-train -t 2 -c 100 -g 100 data0 data0.model
% python drawBoundary.py data0
where:
-t 2 means RBF and -g100 means gamma=100
A gamma of 100 means that you have to be really close to a point to have a kernel value that's non-zero.

Here is the resulting plot: enter image description here 这是结果图: 在此处输入图像描述

Then here is the second question: 然后是第二个问题:

Q2: Why do you get these little blobs? 问题2:为什么会出现这些小斑点? How high do you have to turn gamma up in order to get a little decision boundary around each example (ie each decision boundary surrounds exactly one example)? 您必须将伽玛调高多少才能在每个示例周围获得一个小的决策边界(即每个决策边界恰好围绕一个示例)?

For this question, I completely get lost. 对于这个问题,我完全迷路了。

First question should be 3 points because you need two points on the same side to draw a line, then one point on the other side that will serve to plot the second line (which is parallel to the first one). 第一个问题应该是3分,因为您需要在同一侧获得两个点才能绘制一条线,然后在另一侧获得一个点以绘制第二条线(与第一条线平行)。 These 3 points will be chosen to have the maximum margin (ie so the parallel line have the maximum separation/distance). 将选择这3个点以具有最大的余量(即,平行线具有最大的间隔/距离)。 If you don't use 3 points, then there is always a way to increase this distance between the two parallel lines so it is not the solution we are looking for. 如果您不使用3个点,那么总有一种方法可以增加两条平行线之间的距离,因此这不是我们要寻找的解决方案。

For the second question, I guess to have to try several gamma values to answer. 对于第二个问题,我想必须尝试几个伽玛值才能回答。 Gamma is usually taken in a set of values that are power of 10, (not sure that's proper English so:) eg {1 10 100} = {10^0 10^1 10^2}. 伽玛通常采用10的幂的一组值(不确定英语是否正确:),例如{1 10 100} = {10 ^ 0 10 ^ 1 10 ^ 2}。 And that we choose by cross-validation to fit our data. 并且我们通过交叉验证选择适合我们的数据。 Doing so enable us to have an efficient SVM that do not overfit nor underfit. 这样做使我们能够拥有既不会过度拟合也不会不足的高效SVM。

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM