[英]Gradient of a Loss Function for an SVM
I'm working on this class on convolutional neural networks. 我工作的这个类卷积神经网络。 I've been trying to implement the gradient of a loss function for an svm and (I have a copy of the solution) I'm having trouble understanding why the solution is correct.
我一直在尝试为svm实现损失函数的梯度,并且(我有解决方案的副本)我在理解为什么解决方案正确时遇到了麻烦。
On this page it defines the gradient of the loss function to be as follows: 在此页面上,损耗函数的梯度定义如下:
In my code I my analytic gradient matches with the numeric one when implemented in code as follows:
在我的代码中,当在代码中实现时,我的解析梯度与数字1匹配,如下所示:
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
if margin > 0:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
dW[:, y[i]] += -X[i]
dW[:, j] += X[i] # gradient update for incorrect rows
loss += margin
However, it seems like, from the notes, that dW[:, y[i]]
should be changed every time j == y[i]
since we subtract the the loss whenever j == y[i]
. 然而,这似乎是从笔记,那
dW[:, y[i]]
应改为每次j == y[i]
因为我们减去流失,每当j == y[i]
I'm very confused why the code is not: 我很困惑为什么代码不是:
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
if margin > 0:
dW[:, y[i]] += -X[i]
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
dW[:, j] += X[i] # gradient update for incorrect rows
loss += margin
and the loss would change when j == y[i]
. 当
j == y[i]
时,损耗将改变。 Why are they both being computed when J != y[i]
? 为什么当
J != y[i]
时都计算它们?
I don't have enough reputation to comment, so I am answering here. 我没有足够的声誉来发表评论,所以我在这里回答。 Whenever you compute loss vector for
x[i]
, i
th training example and get some nonzero loss, that means you should move your weight vector for the incorrect class (j != y[i])
away by x[i]
, and at the same time, move the weights or hyperplane for the correct class ( j==y[i]
) near x[i]
. 每当您为第
i
个训练示例计算x[i]
损失矢量并得到一些非零损失时,这意味着您应该将不正确类(j != y[i])
的权重矢量移开x[i]
,然后同时,将权重或超平面移动到x[i]
附近的正确类别( j==y[i]
)。 By parallelogram law, w + x
lies in between w
and x
. 根据平行四边形定律,
w + x
位于w
和x
之间。 So this way w[y[i]]
tries to come nearer to x[i]
each time it finds loss>0
. 因此,每当
w[y[i]]
发现loss>0
时,它就会尝试接近x[i]
。
Thus, dW[:,y[i]] += -X[i]
and dW[:,j] += X[i]
is done in the loop, but while update, we will do in direction of decreasing gradient, so we are essentially adding X[i]
to correct class weights and going away by X[i]
from weights that miss classify. 因此,
dW[:,y[i]] += -X[i]
和dW[:,j] += X[i]
是在循环中完成的,但是在更新时,我们将朝着梯度递减的方向进行操作,因此,我们实质上是在添加X[i]
来校正类权重,并从未分类的权重中减去X[i]
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.