[英]Why does NN training loss flatten?
I implemented a deep learning neural network from scratch without using any python frameworks like tensorflow or keras.我从头开始实现了一个深度学习神经网络,没有使用任何 python 框架,如 tensorflow 或 keras。
The problem is no matter what i change in my code like adjusting learning rate or changing layers or changing no.问题是无论我在代码中进行什么更改,例如调整学习率或更改层或更改否。 of nodes or changing activation functions from sigmoid to relu to leaky relu, i end up with a training loss that starts with 6.98 but always converges to 3.24...节点数量或将激活函数从 sigmoid 更改为 relu 到leaky relu,我最终会得到一个从 6.98 开始但总是收敛到 3.24 的训练损失......
Why is that?这是为什么?
Please review my forward and back prop algorithms.Maybe there's something wrong in that which i couldn't identify.请查看我的前进和后退道具算法。也许我无法识别出问题。
My hidden layers use leaky relu and final layer uses sigmoid activation.我的隐藏层使用leaky relu,最后一层使用 sigmoid 激活。 Im trying to classify the mnist handwritten digits.我试图对 mnist 手写数字进行分类。
code:代码:
#FORWARDPROPAGATION #前向传播
for i in range(layers-1):
cache["a"+str(i+1)]=lrelu((np.dot(param["w"+str(i+1)],cache["a"+str(i)]))+param["b"+str(i+1)])
cache["a"+str(layers)]=sigmoid((np.dot(param["w"+str(layers)],cache["a"+str(layers-1)]))+param["b"+str(layers)])
yn=cache["a"+str(layers)]
m=X.shape[1]
cost=-np.sum((y*np.log(yn)+(1-y)*np.log(1-yn)))/m
if j%10==0:
print(cost)
costs.append(cost)
#BACKPROPAGATION #反向传播
grad={"dz"+str(layers):yn-y}
for i in range(layers):
grad["dw"+str(layers-i)]=np.dot(grad["dz"+str(layers-i)],cache["a"+str(layers-i-1)].T)/m
grad["db"+str(layers-i)]=np.sum(grad["dz"+str(layers-i)],1,keepdims=True)/m
if i<layers-1:
grad["dz"+str(layers-i-1)]=np.dot(param["w"+str(layers-i)].T,grad["dz"+str(layers-i)])*lreluDer(cache["a"+str(layers-i-1)])
for i in range(layers):
param["w"+str(i+1)]=param["w"+str(i+1)] - alpha*grad["dw"+str(i+1)]
param["b"+str(i+1)]=param["b"+str(i+1)] - alpha*grad["db"+str(i+1)]
The implementation seems okay.实施似乎还可以。 While you could converge to the same value with different models/learning rate/hyper parameters, what's frightening is having the same starting value everytime, 6.98 in your case.虽然您可以使用不同的模型/学习率/超参数收敛到相同的值,但令人恐惧的是每次都具有相同的起始值,在您的情况下为 6.98。
I suspect it has to do with your initialisation.我怀疑这与您的初始化有关。 If you're setting all your weights initially to zero, you're not gonna break symmetry.如果您最初将所有权重设置为零,则不会破坏对称性。 That is explained here and here in adequate detail.这在此处和此处进行了充分详细的解释。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.