SVM损失函数的梯度

Question

I'm working on this class on convolutional neural networks. 我工作的这个类卷积神经网络。 I've been trying to implement the gradient of a loss function for an svm and (I have a copy of the solution) I'm having trouble understanding why the solution is correct. 我一直在尝试为svm实现损失函数的梯度，并且（我有解决方案的副本）我在理解为什么解决方案正确时遇到了麻烦。

On this page it defines the gradient of the loss function to be as follows: 在此页面上，损耗函数的梯度定义如下： In my code I my analytic gradient matches with the numeric one when implemented in code as follows: 在我的代码中，当在代码中实现时，我的解析梯度与数字1匹配，如下所示：

 dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if j == y[i]:
        if margin > 0:
            continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        dW[:, y[i]] += -X[i]
        dW[:, j] += X[i] # gradient update for incorrect rows
        loss += margin

However, it seems like, from the notes, that dW[:, y[i]] should be changed every time j == y[i] since we subtract the the loss whenever j == y[i] . 然而，这似乎是从笔记，那dW[:, y[i]]应改为每次j == y[i]因为我们减去流失，每当j == y[i] I'm very confused why the code is not: 我很困惑为什么代码不是：

  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if j == y[i]:
        if margin > 0:
            dW[:, y[i]] += -X[i]
            continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        dW[:, j] += X[i] # gradient update for incorrect rows
        loss += margin

and the loss would change when j == y[i] . 当j == y[i]时，损耗将改变。 Why are they both being computed when J != y[i] ? 为什么当J != y[i]时都计算它们？

Answer 1

I don't have enough reputation to comment, so I am answering here. 我没有足够的声誉来发表评论，所以我在这里回答。 Whenever you compute loss vector for x[i] , i th training example and get some nonzero loss, that means you should move your weight vector for the incorrect class (j != y[i]) away by x[i] , and at the same time, move the weights or hyperplane for the correct class ( j==y[i] ) near x[i] . 每当您为第i个训练示例计算x[i]损失矢量并得到一些非零损失时，这意味着您应该将不正确类(j != y[i])的权重矢量移开x[i] ，然后同时，将权重或超平面移动到x[i]附近的正确类别（ j==y[i] ）。 By parallelogram law, w + x lies in between w and x . 根据平行四边形定律， w + x位于w和x之间。 So this way w[y[i]] tries to come nearer to x[i] each time it finds loss>0 . 因此，每当w[y[i]]发现loss>0时，它就会尝试接近x[i] 。

Thus, dW[:,y[i]] += -X[i] and dW[:,j] += X[i] is done in the loop, but while update, we will do in direction of decreasing gradient, so we are essentially adding X[i] to correct class weights and going away by X[i] from weights that miss classify. 因此， dW[:,y[i]] += -X[i]和dW[:,j] += X[i]是在循环中完成的，但是在更新时，我们将朝着梯度递减的方向进行操作，因此，我们实质上是在添加X[i]来校正类权重，并从未分类的权重中减去X[i] 。

SVM损失函数的梯度

问题描述

1 个解决方案

解决方案1
5 2017-05-25 12:20:42

SVM损失函数的梯度

问题描述

1 个解决方案

解决方案1 5 2017-05-25 12:20:42

解决方案1
5 2017-05-25 12:20:42