[英]What's the proper way to do back propagation in Deep Fully Connected Neural Network for binary classification
I tried to implement a Deep fully connected neural network for binary classification using python and numpy and used Gradient Descent
as optimization algorithm. 我尝试使用python和numpy为二进制分类实现深层全连接神经网络,并使用
Gradient Descent
作为优化算法。
Turns out my model is heavily under fitting , even after 1000 epochs . 事实证明,即使经过1000个时期 ,我的模型仍处于严重拟合状态 。 My loss never improves beyond 0.69321 , i tried checking my weight derivatives and instantly realized they're very small ( as small as 1e-7 ), such small gradients are causing my model to never have bigger gradient descent updates and never reach the global minima .
我的损失永远不会超过0.69321 ,我尝试检查我的权重导数并立即意识到它们很小( 小至1e-7 ),如此小的梯度导致我的模型永远不会有更大的梯度下降更新 ,也永远不会达到全局最小值 。 I will detail out the math/pseudo code for forward and backward propagation's, please let me know if I'm on the right track.
我将详细介绍正向和反向传播的数学/伪代码,请告诉我我是否处在正确的轨道上。 I will follow the naming convention used in
DeepLearning.ai By Andrew Ng
. 我将遵循
DeepLearning.ai By Andrew Ng
在DeepLearning.ai By Andrew Ng
使用的命名约定。
Say we have 4 layer neural network with only one node
at the output layer to classify between 0/1. 假设我们有4层神经网络 ,输出层只有
one node
,可以在0/1之间分类。
X -> Z1 - > A1 - > Z2 - > A2 - > Z3 - > A3 - > Z4 - > A4
Forward propagation
Z1 = W1 dot_product X + B1
A1 = tanh_activation(Z1)
Z2 = W2 dot_product A1 + B2
A2 = tanh_activation(Z2)
Z3 = W3 dot_product A2 + B3
A3 = tanh_activation(Z3)
Z4 = W4 dot_product A3 + B4
A4 = sigmoid_activation(Z4)
Backward Propagation
DA4 = -( Y / A4 + (1 - Y / 1 - A4 ) ) ( derivative of output activations or logits w.r.t to loss function )
DZ4 = DA4 * derivative_tanh(Z4) ( derivative of tanh activation, which I assume is ( 1 - (Z4 ) ^ 2 ) )
Dw4 = ( dZ4 dot_produt A3.T ) / total_number_of_samples
Db4 = np.sum(DZ4, axis = 1, keepdims = True ... ) / total_number_of_samples
DA3 = W4.T dot_product(DZ4)
DZ3 = DA3 * derivative_tanh( Z3 )
DW3 = ( DZ3 dot_product A2.T ) / total_number_of_samples
DB3 = np.sum( DZ3, .. ) / total_number_of_samples
DA2 = W3.T dot_product(DZ3)
DZ2 = DA2 * derivative_tanh( Z2 )
DW2 = ( DZ2 dot_product A1.T ) / total_number_of_samples
DB2 = np.sum( DZ2, .. ) / total_number_of_samples
DA1 = W2.T dot_product(DZ2)
DZ1 = DA1 * derivative_tanh( Z1 )
DW1 = ( DZ1 dot_product X.T ) / total_number_of_samples
DB1 = np.sum( DZ1, .. ) / total_number_of_samples
This is my tanh implementation , 这是我的tanh实现 ,
def tanh_activation(x):
return np.tanh(x)
My tanh derivative implementation 我的tanh派生实现
def derivative_tanh(x):
return ( 1 - np.power(x,2))
After the above back propagation steps I updated the weights and biases using gradient descent with their respective derivatives. 在上述反向传播步骤之后,我使用梯度下降及其各自的导数更新了权重和偏差。 But, no matter how many times I run the algorithm, the model never improves it's loss beyond
0.69
and the derivatives of output weights ( in my case dW4
) is pretty low 1e-7
. 但是,无论我运行算法多少次,该模型都永远不会改善其损失,使其超过
0.69
并且输出权重的导数(在我的情况下为dW4
)非常低1e-7
。 I'm assuming that either my derivative_tanh
function or my calculations of dZ
is really off, which is causing bad loss values to propagate back to the network. 我假设,无论是我的
derivative_tanh
功能或我的计算, dZ
是真的了,这是造成不好的损失值来重新传播到网络。 Please share your thoughts whether my implementation of backprop is valid or not. 请分享您的想法,无论我对backprop的实施是否有效。 TIA.
TIA。 I went through back propagation gradient descent calculus
我经历了反向传播梯度下降演算
and 和
how to optimize weights of neural network .. and many other blogs, but couldn't find for what I was looking for. 如何优化神经网络的权重..和许多其他博客,但找不到我想要的东西。
I found a fix to my problem and answered here: What's the proper way to do back propagtion in deep fully connected neural network . 我找到了解决问题的方法,并在这里回答: 在深度完全连接的神经网络中进行反向传播的正确方法是什么 ? I suggest closing the thread.
我建议关闭线程。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.