简体   繁体   中英

implementing softmax method in python

I'm trying to understand this code from lightaime's Github page. It is a vetorized softmax method. What confuses me is "softmax_output[range(num_train), list(y)]"

What does this expression mean?

def softmax_loss_vectorized(W, X, y, reg):


    """
    Softmax loss function, vectorize implementation
    Inputs have dimension D, there are C classes, and we operate on minibatches of N examples.

    Inputs:
        W: A numpy array of shape (D, C) containing weights.
        X: A numpy array of shape (N, D) containing a minibatch of data.
        y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C.
        reg: (float) regularization strength

    Returns a tuple of:
        loss as single float
        gradient with respect to weights W; an array of same shape as W
    """

    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)


    num_classes = W.shape[1]
    num_train = X.shape[0]
    scores = X.dot(W)
    shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
    softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
    loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))   
    loss /= num_train 
    loss +=  0.5* reg * np.sum(W * W)

    dS = softmax_output.copy()
    dS[range(num_train), list(y)] += -1
    dW = (X.T).dot(dS)
    dW = dW/num_train + reg* W
    return loss, dW

This expression means: slice an array softmax_output of shape (N, C) extracting from it only values related to the training labels y .

Two dimensional numpy.array can be sliced with two lists containing appropriate values (ie they should not cause an index error)

range(num_train) creates an index for the first axis which allows to select specific values in each row with the second index - list(y) . You can find it in the numpy documentation for indexing .

The first index range_num has a length equals to the first dimension of softmax_output (= N ). It points to each row of the matrix; then for each row it selects target value via corresponding value from the second part of an index - list(y) .

Example:

softmax_output = np.array(  # dummy values, not softmax
    [[1, 2, 3], 
     [4, 5, 6],
     [7, 8, 9],
     [10, 11, 12]]
)
num_train = 4  # length of the array
y = [2, 1, 0, 2]  # a labels; values for indexing along the second axis
softmax_output[range(num_train), list(y)]
Out:
[3, 5, 7, 12]

So, it selects third element from the first row, second from the second row etc. That's how it works.

(ps Do I misunderstand you and you interested in "why", not "how"?)

The loss here is defined by following equation

在此处输入图片说明

Here, y is 1 for the class datapoint belongs and 0 for all other classes. Thus we are only interested in softmax outputs for datapoint class. Thus above equation can be rewritten as

在此处输入图片说明

Thus then following code representing above equation.

loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))

The code softmax_output[range(num_train), list(y)] is used to select softmax outputs for respective classes. range(num_train) represents all the training samples and list(y) represents respective classes.

This indexing is nicely explained Mikhail in his answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM