Neural Network Issue with Back Propagation Calculation

Question

I am currently playing around with building a simple neural network to identify written numbers using the MNIST number database. When I run the first test I get a random assortment of probabilities in my output arrays, which is expected. However, when I run it more than once, my output arrays nearly all drop to zeros in all index positions. It seems that my calculations for my weighting adjustments that are being back-propagated are causing an issue but I can't seem to find out why.

def sigmoid(x):
    a = 1 / (1 + np.exp(-x))
    return a

def sigmoid_derivative(x):
    return x * (1 - x)

def train(n, inputs):
    input_layer = inputs
    weights1 = 2 * np.random.random ((784 , 16)) - 1
    weights2 = 2 * np.random.random ((16 , 10)) - 1

    for i in range(n):
        trained_hidden = sigmoid(np.dot(input_layer, weights1))
        trained_outputs = sigmoid(np.dot(trained_hidden, weights2))

        o_error = (outputs - trained_outputs)
        o_adjustments = o_error * sigmoid_derivative(trained_outputs)

        h_error = np.dot(o_adjustments, weights2.T)
        h_adjustments = h_error * sigmoid_derivative(trained_hidden)

        w1 = np.dot(input_layer.T, h_adjustments)
        w2 = np.dot(trained_hidden.T, o_adjustments)

        weights1 += w1
        weights2 += w2

    return trained_outputs

I am using Numpy arrays and the input is a (10000 x 784) array with a 0 - 1 value of greyscale and output it a (10000 x 10) array with a 1 at the index position of the actual digit.

x_train, t_train, x_test, t_test = mnist.load()

inputs = x_test/256

outputs = np.zeros((10000,10), dtype=int)

for i in range(10000):
    x = t_test[int(i)]
    outputs[i][x] = 1

set = train(10, inputs)

I have used a number of resources to build this, including the theory coming from 3 blue 1 brown neural network series and the code being closely followed by the example provided here ( https://enlight.nyc/projects/neural-network/ )

Edit: As per @9000's suggestion, here is a printout of each step in one example. Looking at the results it looks like w1 (the weighting adjustment calculation) is the issue, but looking at it over and over, I cannot figure out why it is incorrect, any help is appreciated.

Edit 2: I have included a second printout of the same example on the second training run.

First Run

trained_hidden [0.87880514 0.4789476  0.38500953 0.00142838 0.01373613 0.37572408 0.53673194 0.11774215 0.99989426 0.0547656  0.20645864 0.85484692 0.99903171 0.88929566 0.00673453 0.03816501]

trained_output [0.33244312 0.26289407 0.79917376 0.95143406 0.90780616 0.2100068 0.66253735 0.57961972 0.28231436 0.15963378]

o_error [ 0.66755688 -0.26289407 -0.79917376 -0.95143406 -0.90780616 -0.2100068 -0.66253735 -0.57961972 -0.28231436 -0.15963378]

o-adjustment [ 0.14814735 -0.05094382 -0.12826344 -0.04396319 -0.07597805 -0.03484096 -0.14813117 -0.14123055 -0.05720055 -0.02141501]

h_error [-0.00359599  0.18884347  0.15954247 -0.14839811  0.2081496  -0.01152563 0.03262859 -0.46315722 -0.06974061 -0.46774417 -0.00690463 -0.44303219 -0.16267084 -0.02505235 -0.12866526  0.22212537]

h_adjustment [-3.82997246e-04  4.71271721e-02  3.77760172e-02 -2.11665993e04 2.81989626e-03 -2.70339996e-03  8.11312465e-03 -4.81122794e02 -7.37327102e-06 -2.42134002e-02 -1.13120886e-03 -5.49730579e-02 -1.57359939e-04 -2.46637616e-03 -8.60664795e-04  8.15387570e-03]

w1 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

w2 [-111.70644608 -164.50671691 -254.60942018 -205.06537232 -330.43317768 -94.6976 -346.78221607 -272.22044431 -249.54889015  -75.99543441]

weights1 [-0.09535479 -0.09824519 -0.11582134 -0.65075843 -0.65593035  0.77593957 -0.0406199 0.12669151  0.79979191 -0.52502487 -0.2433578 0.16617536 -0.25711996  0.92995152 -0.40922601 -0.63029133]

weights2 [-112.24597022 -164.86741004 -254.21715269 -205.27326963 -331.18579697 -95.07615178 -347.04311247 -271.82206581 -250.04075852  -76.69273265]

Second Run

trained_hidden [0.00000000e+000 1.00000000e+000 1.00000000e+000 3.77659154e-181 1.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000 1.00000000e+000 0.00000000e+000 1.00000000e+000 0.00000000e+000 2.71000625e-055 0.00000000e+000 0.00000000e+000 1.00000000e+000]

trained_output [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

o_error [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

o-adjustment [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

h_error [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

h_adjustment [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

w1 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

w2 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

weights1 [-0.09535479 -0.09824519 -0.11582134 -0.65075843 -0.65593035  0.77593957 -0.0406199   0.12669151  0.79979191 -0.52502487 -0.2433578   0.16617536 -0.25711996  0.92995152 -0.40922601 -0.63029133]

weights2 [-112.24597022 -164.86741004 -254.21715269 -205.27326963 -331.18579697 -95.07615178 -347.04311247 -271.82206581 -250.04075852  -76.69273265]

Answer 1

First of all, are you sure your print outs of weights1 and weights2 are correct? They are the same between both runs with very different outputs, that seems very suspicious to me.

I slightly checked your derivatives and from the little I looked into, they look correct. However, I see two mistakes. When updating weights you actually want to subtact the derivative from your weights. Because the gradient always points uphill and you want to minimize the loss, therefore you want to go in the downhill direction. Second possible mistake is that your are using the full derivative as the update, basically always, in neural networks, a learning rate is used (eg 0.001) which is used as a multiplier for the updating derivative, if you don't scale down your gradient before update, its possible that it will overshoot really hard and eg set all your weights to very large values which leads to very unstable optimization.

So my suggestion is to replace:

weights1 += w1
weights2 += w2

with:

learning_rate = 0.001
weights1 -= w1 * learning_rate
weights2 -= w2 * learning_rate

Also, general rule of thumb in debugging neural networks is to use minimal example which your network should fit, so choose a single sample from your dataset and look at the update in each iteration, this will tell you a lot (eg use a debugger). If you can't fit single example, you can't fit 10000.

Neural Network Issue with Back Propagation Calculation

Question

1 answers

solution1
1 ACCPTED 2019-06-25 09:22:02

Neural Network Issue with Back Propagation Calculation

Question

1 answers

solution1 1 ACCPTED 2019-06-25 09:22:02

solution1
1 ACCPTED 2019-06-25 09:22:02