简体繁体中英

Back-propagation and forward-propagation for 2 hidden layers in neural network

原文 2019-02-11 05:44:02 2 2 python/ neural-network/ deep-learning

My question is about forward and backward propagation for deep neural networks when the number of hidden units is greater than 1.

I know what I have to do if I have a single hidden layer. In case of a single hidden layer, if my input data X_train has n samples, with d number of features (ie X_train is a (n, d) dimensional matrix, y_train is a (n,1) dimensional vector) and if I have h1 number of hidden units in my first hidden layer, then I use Z_h1 = (X_train * w_h1) + b_h1 (where w_h1 is a weight matrix with random number entries which has the shape (d, h1) and b_h1 is a bias unit with shape (h1,1) . I use sigmoid activation A_h1 = sigmoid(Z_h1) and find that both A_h1 and Z_h1 have shapes (n, h1) . If I have t number of output units, then I use a weight matrix w_out with dimensions (h1, t) and b_out with shape (t,1) to get the output Z_out = (A_h1 * w_h1) + b_h1 . From here I can get A_out = sigmoid(Z_out) which has shape (n, t) . If I have a 2nd hidden layer (with h2 number of units) after the 1st hidden layer and before the output layer, then what steps must I add to the forward propagation and which steps should I modify?

I also have idea about how to tackle backpropagation in case of single hidden layer neural networks. For the single hidden layer example in the previous paragraph, I know that in the first backpropagation step (output layer -> hidden layer1) , I should do Step1_BP1: Err_out = A_out - y_train_onehot (here y_train_onehot is the onehot representation of y_train . Err_out has shape (n,t) . This is followed by Step2_BP1: delta_w_out = (A_h1)^T * Err_out and delta_b_out = sum(Err_out) . The symbol (.)^T denotes the transpose of matrix. For the second backpropagation step (hidden layer1 -> input layer) , we do the following Step1_BP2: sig_deriv_h1 = (A_h1) * (1-A_h1) . Here sig_deriv_h1 has shape (n,h1) . In the next step, I do Step2_BP2: Err_h1 = \\Sum_i \\Sum_j [ ( Err_out * w_out.T)_{i,j} * sig_deriv_h1__{i,j} ) ]. Here, Err_h1 has shape (n,h1) . In the final step, I do Step3_BP2: delta_w_h1 = (X_train)^T * Err_h1 and delta_b_h1 = sum(Err_h1) . What backpropagation steps should I add if I have a 2nd hidden layer (h2 number of units) after the 1st hidden layer and before the output layer? Should I modify the backpropagation steps for the one hidden layer case that I have described here?

2 answers

● Let X be a matrix of samples with shape (n, d) , where n denotes number of samples, and d denotes number of features.

● Let w _h1 be the matrix of weights - of shape (d, h1) , and

● Let b _h1 be the bias vector of shape (1, h1) .

You need the following steps for forward and backward propagations:

► FORWARD PROPAGATION:

⛶ Step 1:

Z _h1 = [ X • w _h1 ] + b _h1

↓ ↓ ↓ ↓

(n,h1) (n,d) (d,h1) (1,h1)

Here, the symbol • represents matrix multiplication, and the h1 denotes the number of hidden units in the first hidden layer.

⛶ Step 2:

Let Φ() be the activation function. We get.

a _h1 = Φ (Z _h1 )

↓ ↓

(n,h1) (n,h1)

⛶ Step 3:

Obtain new weights and biases:

● w _h2 of shape (h1, h2) , and

● b _h2 of shape (1, h2) .

⛶ Step 4:

Z _h2 = [ a _h1 • w _h2 ] + b _h2

↓ ↓ ↓ ↓

(n,h2) (n,h1) (h1,h2) (1,h2)

Here, h2 is the number of hidden units in the second hidden layer.

⛶ Step 5:

a _h2 = Φ (Z _h2 )

↓ ↓

(n,h2) (n,h2)

⛶ Step 6:

Obtain new weights and biases:

● w _out of shape (h2, t) , and

● b _out of shape (1, t) .

Here, t is the number of classes.

⛶ Step 7:

Z _out = [ a _h2 • w _out ] + b _out

↓ ↓ ↓ ↓

(n,t) (n,h2) (h2,t) (1,t)

⛶ Step 8:

a _out = Φ (Z _out )

↓ ↓

(n,t) (n,t)

► BACKWARD PROPAGATION:

⛶ Step 1:

Construct the one-hot encoded matrix of the unique output classes ( y _one-hot ).

Error _out = a _out - y _one-hot

↓ ↓ ↓

(n,t) (n,t) (n,t)

⛶ Step 2:

Δw _out = η ( a _h2 ^T • Error _out )

↓ ↓ ↓

(h2,t) (h2,n) (n,t)

Δb _out = η [ ∑ _i=1 ⁿ (Error _out,i ) ]

↓ ↓

(1,t) (1,t)

Here η is the learning rate.

w _out = w _out - Δw _out (weight update.)

b _out = b _out - Δb _out (bias update.)

⛶ Step 3:

Error ₂ = [Error _out • w _out ^T ] ✴ Φ ^/ (a _h2 )

↓ ↓ ↓ ↓

(n,h2) (n,t) (t,h2) (n,h2)

Here, the symbol ✴ denotes element wise matrix multiplication. The symbol Φ ^/ represents derivative of sigmoid function.

⛶ Step 4:

Δw _h2 = η ( a _h1 ^T • Error ₂ )

↓ ↓ ↓

(h1,h2) (h1,n) (n,h2)

Δb _h2 = η [ ∑ _i=1 ⁿ (Error _2,i ) ]

↓ ↓

(1,h2) (1,h2)

w _h2 = w _h2 - Δw _h2 (weight update.)

b _h2 = b _h2 - Δb _h2 (bias update.)

⛶ Step 5:

Error ₃ = [Error ₂ • w _h2 ^T ] ✴ Φ ^/ (a _h1 )

↓ ↓ ↓ ↓

(n,h1) (n,h2) (h2,h1) (n,h1)

⛶ Step 6:

Δw _h1 = η ( X ^T • Error ₃ )

↓ ↓ ↓

(d,h1) (d,n) (n,h1)

Δb _h1 = η [ ∑ _i=1 ⁿ (Error _3,i ) ]

↓ ↓

(1,h1) (1,h1)

w _h1 = w _h1 - Δw _h1 (weight update.)

b _h1 = b _h1 - Δb _h1 (bias update.)

For Forward Propagation, the dimension of the output from the first hidden layer must cope up with the dimensions of the second input layer.

As mentioned above, your input has dimension (n,d) . The output from hidden layer1 will have a dimension of (n,h1) . So the weights and bias for the second hidden layer must be (h1,h2) and (h1,h2) respectively.

So w_h2 will be of dimension (h1,h2) and b_h2 will be (h1,h2) .

The dimensions for the weights and bias for the output layer will be w_output will be of dimension (h2,1) and b_output will be (h2,1) .

The same you have to repeat in Backpropagation.

Forward Propagation for Neural Network

When should Back-propagation algorithm be called during Neural Network training?

Neural network: Batch-version affine layer back-propagation weight matrix updation

Multi-layer neural network back-propagation formula (using stochastic gradient descent)

Implementing general back-propagation

Neural Network From Scratch - Forward propagation error

Make Your Own Neural Network Back Propagation

Back propagation for neural network ( Error in shapes )

Neural Network Issue with Back Propagation Calculation

Updating weights in general back-propagation

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Forward Propagation for Neural Network When should Back-propagation algorithm be called during Neural Network training? Neural network: Batch-version affine layer back-propagation weight matrix updation Multi-layer neural network back-propagation formula (using stochastic gradient descent) Implementing general back-propagation Neural Network From Scratch - Forward propagation error Make Your Own Neural Network Back Propagation Back propagation for neural network ( Error in shapes ) Neural Network Issue with Back Propagation Calculation Updating weights in general back-propagation

Related Tags

Back-propagation and forward-propagation for 2 hidden layers in neural network

Question

2 answers

solution1
6 ACCPTED 2019-06-03 11:00:00

solution2
1 2019-02-11 06:49:35

Back-propagation and forward-propagation for 2 hidden layers in neural network

Question

2 answers

solution1 6 ACCPTED 2019-06-03 11:00:00

solution2 1 2019-02-11 06:49:35

solution1
6 ACCPTED 2019-06-03 11:00:00

solution2
1 2019-02-11 06:49:35