简体   繁体   中英

Compute the gradient of the SVM loss function

I am trying to implement the SVM loss function and its gradient. I found some example projects that implement these two, but I could not figure out how they can use the loss function when computing the gradient.

Here is the formula of loss function: 在此处输入图片说明

What I cannot understand is that how can I use the loss function's result while computing gradient?

The example project computes the gradient as follows:

for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        loss += margin
        dW[:,j] += X[i]
        dW[:,y[i]] -= X[i] 

dW is for gradient result. And X is the array of training data. But I didn't understand how the derivative of the loss function results in this code.

The method to calculate gradient in this case is Calculus (analytically, NOT numerically!). So we differentiate loss function with respect to W(yi) like this: 在此处输入图片说明

and with respect to W(j) when j!=yi is:

在此处输入图片说明

The 1 is just indicator function so we can ignore the middle form when condition is true. And when you write in code, the example you provided is the answer.

Since you are using cs231n example, you should definitely check note and videos if needed.

Hope this helps!

If the substraction less than zero the loss is zero so the gradient of W is also zero. If the substarction larger than zero, then the gradient of W is the partial derviation of the loss.

If we don't keep these two lines of code:

dW[:,j] += X[i]
dW[:,y[i]] -= X[i] 

we get loss value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM