简体   繁体   中英

implementing softmax method in python

I'm trying to understand this code from lightaime's Github page. It is a vetorized softmax method. What confuses me is "softmax_output[range(num_train), list(y)]"

What does this expression mean?

def softmax_loss_vectorized(W, X, y, reg):

    Softmax loss function, vectorize implementation
    Inputs have dimension D, there are C classes, and we operate on minibatches of N examples.

        W: A numpy array of shape (D, C) containing weights.
        X: A numpy array of shape (N, D) containing a minibatch of data.
        y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C.
        reg: (float) regularization strength

    Returns a tuple of:
        loss as single float
        gradient with respect to weights W; an array of same shape as W

    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)

    num_classes = W.shape[1]
    num_train = X.shape[0]
    scores = X.dot(W)
    shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
    softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
    loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))   
    loss /= num_train 
    loss +=  0.5* reg * np.sum(W * W)

    dS = softmax_output.copy()
    dS[range(num_train), list(y)] += -1
    dW = (X.T).dot(dS)
    dW = dW/num_train + reg* W
    return loss, dW

This expression means: slice an array softmax_output of shape (N, C) extracting from it only values related to the training labels y .

Two dimensional numpy.array can be sliced with two lists containing appropriate values (ie they should not cause an index error)

range(num_train) creates an index for the first axis which allows to select specific values in each row with the second index - list(y) . You can find it in the numpy documentation for indexing .

The first index range_num has a length equals to the first dimension of softmax_output (= N ). It points to each row of the matrix; then for each row it selects target value via corresponding value from the second part of an index - list(y) .


softmax_output = np.array(  # dummy values, not softmax
    [[1, 2, 3], 
     [4, 5, 6],
     [7, 8, 9],
     [10, 11, 12]]
num_train = 4  # length of the array
y = [2, 1, 0, 2]  # a labels; values for indexing along the second axis
softmax_output[range(num_train), list(y)]
[3, 5, 7, 12]

So, it selects third element from the first row, second from the second row etc. That's how it works.

(ps Do I misunderstand you and you interested in "why", not "how"?)

The loss here is defined by following equation


Here, y is 1 for the class datapoint belongs and 0 for all other classes. Thus we are only interested in softmax outputs for datapoint class. Thus above equation can be rewritten as


Thus then following code representing above equation.

loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))

The code softmax_output[range(num_train), list(y)] is used to select softmax outputs for respective classes. range(num_train) represents all the training samples and list(y) represents respective classes.

This indexing is nicely explained Mikhail in his answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM