简体   繁体   中英

Deep learning: the code for backpropagation in Python

I was reading the free online book and I was struggling with some part of the codes.

(The codes are derived from Michael Nielsen)

class Network(object):

def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The "mini_batch" is a list of tuples "(x, y)", and "eta"
        is the learning rate."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw 
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb 
                       for b, nb in zip(self.biases, nabla_b)]


def backprop(self, x, y):
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass

Because it says mini_batch is a list of tuples (x, y) , the argument of x in the function backprop is a scalar, right? If so, since w (weights) is a matrix (say its dimension is n*p ), whose row has n neurons in the l th layer and column has p neurons in the l-1 layer. Then, x must be nx 1 vector. I feel confused.

In the book example, it used [2,3,1] , ie three layers with 2,3 and 1 neurons respectively. Because the first layer inputs, it has two elements. so the weight matrix in the 2nd layer has 3*2 dimensions. It seems x should be a vector with length 2 to do the matrix multiplication with w .

Also, the code about the partial derivative of C_x with respect to activation a is as follow:

def cost_derivative(self, output_activations, y):
        """Return the vector of partial derivatives \partial C_x /
        \partial a for the output activations."""
        return (output_activations-y) 

I checked the formula I understand that ( output_activations-y ) means the change of the cost. But is this supposed to divided by the change of the activations?

Could you help me?

Because it says "mini_batch" is a list of tuples "(x, y)", the argument of x in the function backprop is a scalar, right?

No. The word "batch" corresponds to a python list. And in the batch/python list there are pairs like (x, y) , where x represents the input vector and y is the label. The shape of x depends on how you create your network object. In the case [2, 3, 1] , x should be vector of shape (2,) .

But is this supposed to divided by the change of the activations?

No.

First, what you are thinking about is called "numerical differentiation". Since you don't have the change of cost, you shouldn't devide it by the change of the activation.

Second, what the author uses is called "analytical differentiation". Say, you have a function f(x, y) = 0.5*(xy)^2 . The partial derivative of f w.r.t. x is xy . Thus, you don't need to divide it by the change of x . However, you need to pay attention to the actual cost function used in order to derive the correct derivative. Sometimes it is not obvious how the values are calculated. In this case, the loss is mean squared error, as stated in the online book.


Reply to comment:

training set whose input is vector_x = (1,2,3)

Training set should be a container containing a set of training samples, where each training sample consists of an input and the corresponding label. So, an example of a mini-batch is maybe a python list like: [([0, 1], [1, 2]), ([2, 1], [3, 2])] , which indicates that there are two training samples. The first one is ([0, 1], [1, 2]) and the second one is ([2, 1], [3, 2]) . Take the first one as an example, its input is [0, 1] (vector of shape (2,) ), its output is [1, 2] , which means, both of the input and the desired output of a single training sample could be vectors. There are some ambiguities in your question ( [(1,a),(2,b),(3,c)] ) so I would prefer my explanation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM