简体   繁体   English

了解支持向量回归 (SVR)

[英]Understanding Support Vector Regression (SVR)

I'm working with SVR, and using this resource .我正在使用 SVR,并使用此资源 Erverything is super clear, with epsilon intensive loss function (from figure).一切都超级清晰,带有epsilon 密集损失函数(来自图)。 Prediction comes with tube, to cover most training sample, and generalize bounds, using support vectors.预测带有管,用于覆盖大多数训练样本,并使用支持向量概括边界。

在此处输入图片说明

在此处输入图片说明

Then we have this explanation.然后我们就有了这个解释。 This can be described by introducing (non-negative) slack variables , to measure the deviation of training samples outside -insensitive zone. I understand this error, outside tube, but don't know, how we can use this in optimization.我理解这个错误,管外,但不知道,我们如何在优化中使用它。 Could somebody explain this?有人可以解释一下吗?

在此处输入图片说明

在此处输入图片说明


In local source.在本地来源。 I'm trying to achieve very simple optimization solution, without libraries.我正在尝试实现非常简单的优化解决方案,没有库。 This what I have for loss function.这就是我的损失函数。

import numpy as np

# Kernel func, linear by default
def hypothesis(x, weight, k=None):
    k = k if k else lambda z : z
    k_x = np.vectorize(k)(x)
    return np.dot(k_x, np.transpose(weight))

.......

import math

def boundary_loss(x, y, weight, epsilon):
    prediction = hypothesis(x, weight)

    scatter = np.absolute(
        np.transpose(y) - prediction)
    bound = lambda z: z \
        if z >= epsilon else 0

    return np.sum(np.vectorize(bound)(scatter))

First, let's look at the objective function.首先,让我们看一下目标函数。 The first term, 1/2 * w^2 (wish this site had LaTeX support but this will suffice) correlates with the margin of the SVM.第一项, 1/2 * w^2 (希望这个站点有 LaTeX 支持,但这就足够了)与 SVM 的边距相关。 The article you linked doesn't, in my opinion, explain this very well and calls this term describing "the model's complexity", but perhaps this is not the best way of explaining it.在我看来,您链接的文章并没有很好地解释这一点,并将这个术语称为描述“模型的复杂性”,但这也许不是解释它的最佳方式。 Minimizing this term maximizes the margin (while still representing the data well), which is the predominant goal of using SVM's doing regression.最小化这个术语可以最大化边距(同时仍然可以很好地代表数据),这是使用 SVM 进行回归的主要目标。

Warning, Math Heavy Explanation: The reason this is the case is that when maximizing the margin, you want to find the "farthest" non-outlier points right on the margin and minimize its distance.警告,Math Heavy 解释:这种情况的原因是在最大化边距时,您希望在边距上找到“最远”的非异常点并最小化其距离。 Let this farthest point be x_n .让这个最远点为x_n We want to find its Euclidean distance d from the plane f(w, x) = 0 , which I will rewrite as w^T * x + b = 0 (where w^T is just the transpose of the weights matrix so that we can multiply the two).我们想找到它与平面f(w, x) = 0欧几里得距离d ,我将其重写为w^T * x + b = 0 (其中w^T只是权重矩阵的转置,以便我们可以将两者相乘)。 To find the distance, let us first normalize the plane such that |w^T * x_n + b| = epsilon为了找到距离,让我们首先对平面进行归一化,使得|w^T * x_n + b| = epsilon |w^T * x_n + b| = epsilon , which we can do WLOG as w is still able to form all possible planes of the form w^T * x + b= 0 . |w^T * x_n + b| = epsilon ,我们可以做 WLOG 因为w仍然能够形成w^T * x + b= 0形式的所有可能平面。 Then, let's note that w is perpendicular to the plane.然后,让我们注意到w垂直于平面。 This is obvious if you have dealt a lot with planes (particularly in vector calculus), but can be proven by choosing two points on the plane x_1 and x_2 , then noticing that w^T * x_1 + b = 0 , and w^T * x_2 + b = 0 .如果您已经处理了很多平面(特别是在向量微积分中),这很明显,但是可以通过选择平面x_1x_2上的两个点来证明,然后注意到w^T * x_1 + b = 0w^T * x_2 + b = 0 Subtracting the two equations we get w^T(x_1 - x_2) = 0 .将两个方程相减,我们得到w^T(x_1 - x_2) = 0 Since x_1 - x_2 is just any vector strictly on the plane, and its dot product with w is 0, then we know that w is perpendicular to the plane.由于x_1 - x_2只是平面上的任意向量,并且其与w点积为 0,因此我们知道w垂直于平面。 Finally, to actually calculate the distance between x_n and the plane, we take the vector formed by x_n' and some point on the plane x' (The vectors would then be x_n - x' , and projecting it onto the vector w . Doing this, we get d = |w * (x_n - x') / |w|| , which we can rewrite as d = (1 / |w|) * | w^T * x_n - w^T x'| , and then add and subtract b to the inside to get d = (1 / |w|) * | w^T * x_n + b - w^T * x' - b| . Notice that w^T * x_n + b is epsilon (from our normalization above), and that w^T * x' + b is 0, as this is just a point on our plane. Thus, d = epsilon / |w| . Notice that maximizing this distance subject to our constraint of finding the x_n and having |w^T * x_n + b| = epsilon is a difficult optimization problem. What we can do is restructure this optimization problem as minimizing 1/2 * w^T * w subject to the first two constraints in the picture you attached, that is, |y_i - f(x_i, w)| <= epsilon . You may think that I have forgotten the slack variables, and this is t最后,为了实际计算x_n和平面之间的距离,我们取由x_n'和平面x'上的某个点形成的向量(向量将是x_n - x' ,并将其投影到向量w 。这样做,我们得到d = |w * (x_n - x') / |w|| ,我们可以将其重写为d = (1 / |w|) * | w^T * x_n - w^T x'| ,并且然后在里面加减b得到d = (1 / |w|) * | w^T * x_n + b - w^T * x' - b| 。注意w^T * x_n + bepsilon (来自我们上面的归一化),并且w^T * x' + b是 0,因为这只是我们平面上的一个点。因此, d = epsilon / |w| 。注意最大化这个距离受我们的约束找到x_n并让|w^T * x_n + b| = epsilon是一个困难的优化问题。我们可以做的是将这个优化问题重构为最小化1/2 * w^T * w 。你附上的图片,就是|y_i - f(x_i, w)| <= epsilon 。你可能认为我忘记了松弛变量,这就是 t rue, but when just focusing on this term and ignoring the second term, we ignore the slack variables for now, I will bring them back later. rue,但是当只关注这一项而忽略第二项时,我们暂时忽略了松弛变量,稍后我将把它们带回来。 The reason these two optimizations are equivalent is not obvious, but the underlying reason lies in discrimination boundaries, which you are free to read more about (it's a lot more math that frankly I don't think this answer needs more of).这两个优化等价的原因并不明显,但根本原因在于歧视边界,您可以自由阅读更多内容(坦率地说,我认为这个答案需要更多的数学知识)。 Then, note that minimizing 1/2 * w^T * w is the same as minimizing 1/2 * |w|^2 , which is the desired result we were hoping for.然后,请注意,最小化1/2 * w^T * w与最小化1/2 * |w|^2 ,这是我们希望得到的结果。 End of the Heavy Math重数学的结束

Now, notice that we want to make the margin big, but not so big that includes noisy outliers like the one in the picture you provided.现在,请注意我们想让边距变大,但不要太大,包括像您提供的图片中那样的嘈杂异常值。

Thus, we introduce a second term.因此,我们引入第二项。 To motivate the margin down to a reasonable size the slack variables are introduced, (I will call them p and p* because I don't want to type out "psi" every time).为了将边距降低到合理的大小,引入了松弛变量(我将它们称为pp*因为我不想每次都输入“psi”)。 These slack variables will ignore everything in the margin, ie those are the points that do not harm the objective and the ones that are "correct" in terms of their regression status.这些松弛变量将忽略边距中的所有内容,即那些不会损害目标的点和那些在回归状态方面“正确”的点。 However, the points outside the margin are outliers, they do not reflect well on the regression, so we penalize them simply for existing.然而,边缘之外的点是异常值,它们不能很好地反映在回归中,所以我们仅仅因为存在而惩罚它们。 The slack error function that is given there is relatively easy to understand, it just adds up the slack error of every point ( p_i + p*_i ) for i = 1,...,N , and then multiplies by a modulating constant C which determines the relative importance of the two terms.那里给出的松弛误差函数比较容易理解,它只是将i = 1,...,N的每个点 ( p_i + p*_i ) 的松弛误差相加,然后乘以一个调制常数C这决定了这两个术语的相对重要性。 A low value of C means that we are okay with having outliers, so the margin will be thinned and more outliers will be produced.较低的C值意味着我们可以接受异常值,因此边缘会变薄,并且会产生更多的异常值。 A high value of C indicates that we care a lot about not having slack, so the margin will be made bigger to accommodate these outliers at the expense of representing the overall data less well.的高值C表明我们非常关心没有懈怠,所以保证金将作出更大的代表整体数据较差为代价,以适应这些异常值。

A few things to note about p and p* .关于pp*一些注意事项。 First, note that they are both always >= 0 .首先,请注意它们总是>= 0 The constraint in your picture shows this, but it also intuitively makes sense as slack should always add to the error, so it is positive.您图片中的约束显示了这一点,但它也很直观,因为松弛应始终增加错误,因此它是积极的。 Second, notice that if p > 0 , then p* = 0 and vice versa as an outlier can only be on one side of the margin.其次,请注意,如果p > 0 ,则p* = 0反之亦然,因为异常值只能位于边距的一侧。 Last, all points inside the margin will have p and p* be 0, since they are fine where they are and thus do not contribute to the loss.最后,边距内的所有点都将pp*为 0,因为它们在它们所在的位置很好,因此不会造成损失。

Notice that with the introduction of the slack variables, if you have any outliers then you won't want the condition from the first term, that is, |w^T * x_n + b| = epsilon请注意,随着松弛变量的引入,如果您有任何异常值,那么您将不需要第一项的条件,即|w^T * x_n + b| = epsilon |w^T * x_n + b| = epsilon as the x_n would be this outlier, and your whole model would be screwed up. |w^T * x_n + b| = epsilon因为x_n将是这个异常值,并且您的整个模型将被搞砸。 What we allow for, then, is to change the constraint to be |w^T * x_n + b| = epsilon + (p + p*)然后,我们允许将约束更改为|w^T * x_n + b| = epsilon + (p + p*) |w^T * x_n + b| = epsilon + (p + p*) . |w^T * x_n + b| = epsilon + (p + p*) When translated to the new optimization's constraint, we get the full constraint from the picture you attached, that is, |y_i - f(x_i, w)| <= epsilon + p + p*当转换为新优化的约束时,我们从您附上的图片中得到完整的约束,即|y_i - f(x_i, w)| <= epsilon + p + p* |y_i - f(x_i, w)| <= epsilon + p + p* . |y_i - f(x_i, w)| <= epsilon + p + p* (I combined the two equations into one here, but you could rewrite them as the picture is and that would be the same thing). (我在这里将两个等式合二为一,但您可以按照图片的方式重写它们,这将是同一件事)。

Hopefully after covering all this up, the motivation for the objective function and the corresponding slack variables makes sense to you.希望在覆盖所有这些之后,目标函数的动机和相应的松弛变量对您来说是有意义的。


If I understand the question correctly, you also want code to calculate this objective/loss function, which I think isn't too bad.如果我正确理解了这个问题,您还需要代码来计算这个目标/损失函数,我认为这还不错。 I have not tested this (yet), but I think this should be what you want.我还没有测试过这个(还),但我认为这应该是你想要的。

# Function for calculating the error/loss for a SVM. I assume that:
#  - 'x' is 2d array representing the vectors of the data points
#  - 'y' is an array representing the values each vector actually gives
#  - 'weights' is an array of weights that we tune for the regression
#  - 'epsilon' is a scalar representing the breadth of our margin.
def optimization_objective(x, y, weights, epsilon):
    # Calculates first term of objective (note that norm^2 = dot product)
    margin_term = np.dot(weight, weight) / 2

    # Now calculate second term of objective. First get the sum of slacks.
    slack_sum = 0
    for i in range(len(x)): # For each observation
        # First find the absolute distance between expected and observed.
        diff = abs(hypothesis(x[i]) - y[i])
        # Now subtract epsilon
        diff -= epsilon
        # If diff is still more than 0, then it is an 'outlier' and will have slack.
        slack = max(0, diff)
        # Add it to the slack sum
        slack_sum += slack

    # Now we have the slack_sum, so then multiply by C (I picked this as 1 aribtrarily)
    C = 1
    slack_term = C * slack_sum

    # Now, simply return the sum of the two terms, and we are done.
    return margin_term + slack_term

I got this function working on my computer with small data, and you may have to change it a little to work with your data if, for example, the arrays are structured differently, but the idea is there.我让这个函数在我的计算机上使用小数据运行,如果例如数组的结构不同,您可能需要稍微更改它以处理您的数据,但想法就在那里。 Also, I am not the most proficient with python, so this may not be the most efficient implementation, but my intent was to make it understandable.另外,我不是最精通python的人,所以这可能不是最有效的实现,但我的目的是让它易于理解。

Now, note that this just calculates the error/loss (whatever you want to call it).现在,请注意,这只是计算错误/损失(无论您想怎么称呼它)。 To actually minimize it requires going into Lagrangians and intense quadratic programming which is a much more daunting task.要真正将其最小化,需要进入拉格朗日函数和密集的二次规划,这是一项更加艰巨的任务。 There are libraries available for doing this but if you want to do this library free as you are doing with this, I wish you good luck because doing that is not a walk in the park.有一些图书馆可以用来做这件事,但如果你想像这样做一样免费做这个图书馆,我祝你好运,因为这样做不是在公园里散步。

Finally, I would like to note that most of this information I got from notes I took in my ML class I took last year, and the professor (Dr. Abu-Mostafa) was a great help to have me learn the material.最后,我想指出,这些信息大部分是从我去年在 ML 课上做的笔记中获得的,教授(Abu-Mostafa 博士)对我学习这些材料有很大帮助。 The lectures for this class are online (by the same prof), and the pertinent ones for this topic are here and here (although in my very biased opinion you should watch all the lectures, they were a great help).这门课的讲座是在线的(由同一位教授),与该主题相关的讲座在这里这里(尽管在我非常有偏见的观点中,您应该观看所有讲座,它们对您有很大帮助)。 Leave a comment/question if you need anything cleared up or if you think I made a mistake somewhere.如果您需要澄清任何问题,或者您认为我在某处犯了错误,请发表评论/问题。 If you still don't understand, I can try to edit my answer to make more sense.如果您仍然不明白,我可以尝试编辑我的答案以使其更有意义。 Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM