Error is not decreasing with epoch in neural network

Question

I have a dataset of images of numbers(with only 2 class, 7 and 9). So, its a binary classification problem. Input image is 28*28 features. So, I am using neural network with 784 neurons in input layer. 100 and 50 neurons in hidden layers. 2 neurons in output layer. Using learning rate as 0.3.

My question is Why the error is not decreasing with epoch? Am I doing something wrong ? I have 7125 samples in train dataset.

>epoch=0, lrate=0.300, error=7124.996
>epoch=1, lrate=0.300, error=7124.996
>epoch=2, lrate=0.300, error=7124.996
>epoch=3, lrate=0.300, error=7124.996
>epoch=4, lrate=0.300, error=7124.995
>epoch=5, lrate=0.300, error=7124.995
>epoch=6, lrate=0.300, error=7124.995
>epoch=7, lrate=0.300, error=7124.995
>epoch=8, lrate=0.300, error=7124.995
>epoch=9, lrate=0.300, error=7124.995
>epoch=10, lrate=0.300, error=7124.995
>epoch=11, lrate=0.300, error=7124.994
>epoch=12, lrate=0.300, error=7124.994
>epoch=13, lrate=0.300, error=7124.994
>epoch=14, lrate=0.300, error=7124.994
>epoch=15, lrate=0.300, error=7124.994
>epoch=16, lrate=0.300, error=7124.993
>epoch=17, lrate=0.300, error=7124.993
>epoch=18, lrate=0.300, error=7124.993 
>epoch=19, lrate=0.300, error=7124.992 
>epoch=20, lrate=0.300, error=7124.992 
>epoch=21, lrate=0.300, error=7124.992 
>epoch=22, lrate=0.300, error=7124.991 
>epoch=23, lrate=0.300, error=7124.991 
>epoch=24, lrate=0.300, error=7124.990 
>epoch=25, lrate=0.300, error=7124.989 
>epoch=26, lrate=0.300, error=7124.989 
>epoch=27, lrate=0.300, error=7124.988 
>epoch=28, lrate=0.300, error=7124.987 
>epoch=29, lrate=0.300, error=7124.985 
>epoch=30, lrate=0.300, error=7124.984 
>epoch=31, lrate=0.300, error=7124.982 
>epoch=32, lrate=0.300, error=7124.980 
>epoch=33, lrate=0.300, error=7124.977 
>epoch=34, lrate=0.300, error=7124.972 
>epoch=35, lrate=0.300, error=7124.966 
>epoch=36, lrate=0.300, error=7124.957 
>epoch=37, lrate=0.300, error=7124.940 
>epoch=38, lrate=0.300, error=7124.899 
>epoch=39, lrate=0.300, error=7124.544 
>epoch=40, lrate=0.300, error=6322.611 
>epoch=41, lrate=0.300, error=5425.721
>epoch=42, lrate=0.300, error=4852.422 
>epoch=43, lrate=0.300, error=4384.062 
>epoch=44, lrate=0.300, error=4204.247 
>epoch=45, lrate=0.300, error=4091.508 
>epoch=46, lrate=0.300, error=4030.757 
>epoch=47, lrate=0.300, error=4014.341 
>epoch=48, lrate=0.300, error=3999.759 
>epoch=49, lrate=0.300, error=4008.330 
>epoch=50, lrate=0.300, error=3995.592 
>epoch=51, lrate=0.300, error=3964.337 
>epoch=52, lrate=0.300, error=3952.369 
>epoch=53, lrate=0.300, error=3965.271 
>epoch=54, lrate=0.300, error=3989.814 
>epoch=55, lrate=0.300, error=3972.481 
>epoch=56, lrate=0.300, error=3937.723 
>epoch=57, lrate=0.300, error=3917.152 
>epoch=58, lrate=0.300, error=3901.988
>epoch=59, lrate=0.300, error=3920.768

If I change the neurons in hidden layers (5 + 2). I am getting better result. Why is it so?

>epoch=0, lrate=0.300, error=4634.128, l_rate=0.300
>epoch=1, lrate=0.300, error=4561.231, l_rate=0.300
>epoch=2, lrate=0.300, error=3430.602, l_rate=0.300
>epoch=3, lrate=0.300, error=927.599, l_rate=0.300
>epoch=4, lrate=0.300, error=843.441, l_rate=0.300
>epoch=5, lrate=0.300, error=741.719, l_rate=0.300
>epoch=6, lrate=0.300, error=734.094, l_rate=0.300
>epoch=7, lrate=0.300, error=691.922, l_rate=0.300
>epoch=8, lrate=0.300, error=705.822, l_rate=0.300
>epoch=9, lrate=0.300, error=629.065, l_rate=0.300
>epoch=10, lrate=0.300, error=588.232, l_rate=0.300
>epoch=11, lrate=0.300, error=592.619, l_rate=0.300
>epoch=12, lrate=0.300, error=554.380, l_rate=0.300
>epoch=13, lrate=0.300, error=555.677, l_rate=0.300
>epoch=14, lrate=0.300, error=555.798, l_rate=0.300
>epoch=15, lrate=0.300, error=523.214, l_rate=0.300
>epoch=16, lrate=0.300, error=530.260, l_rate=0.300
>epoch=17, lrate=0.300, error=491.709, l_rate=0.300
>epoch=18, lrate=0.300, error=469.119, l_rate=0.300
>epoch=19, lrate=0.300, error=472.025, l_rate=0.300
>epoch=20, lrate=0.300, error=473.940, l_rate=0.300
>epoch=21, lrate=0.300, error=438.288, l_rate=0.300
>epoch=22, lrate=0.300, error=412.492, l_rate=0.300
>epoch=23, lrate=0.300, error=424.129, l_rate=0.300
>epoch=24, lrate=0.300, error=427.414, l_rate=0.300
>epoch=25, lrate=0.300, error=435.418, l_rate=0.300
>epoch=26, lrate=0.300, error=406.067, l_rate=0.300
>epoch=27, lrate=0.300, error=411.439, l_rate=0.300
>epoch=28, lrate=0.300, error=373.220, l_rate=0.300
>epoch=29, lrate=0.300, error=381.987, l_rate=0.300
>epoch=30, lrate=0.300, error=359.585, l_rate=0.300
>epoch=31, lrate=0.300, error=368.407, l_rate=0.300
>epoch=32, lrate=0.300, error=351.560, l_rate=0.300
>epoch=33, lrate=0.300, error=359.028, l_rate=0.300
>epoch=34, lrate=0.300, error=371.987, l_rate=0.300
>epoch=35, lrate=0.300, error=336.106, l_rate=0.300
>epoch=36, lrate=0.300, error=318.453, l_rate=0.300

Answer 1

That's a common problem; it's usually caused by the design of your NN and/or data. To avoid that, you may need to reconsider your designing decisions.

From your question, it seems like you decided to go for a specific design that if changed could help a lot. Let's have a look:

0.3 learning rate: That's a super high value for learning rate. You want to lower this value as much as possible. While you spend a longer time for training but it enhances the consistency of the learning. deeplearning4j guys advice is "Typical values for the learning rate are in the range of 0.1 to 1e-6, though the optimal learning rate is usually data (and network architecture) specific. Some simple advice is to start by trying three different learning rates – 1e-1, 1e-3, and 1e-6 – to get a rough idea of what it should be, before further tuning this. Ideally, they run models with different learning rates simultaneously to save time."
100+50 neurons in hidden layers. While I do not know for sure what kind of task this NN is performing, this number sounds low to me since you have 784 neurons in your input layer. Have a look at this paper: "What Size Neural Network Gives Optimal Generalization?" . Please update your question to help us understand your problem.

Answer 2

Your first model had many more parameters (1st hidden layer: 78,400 weights and 100 biases, 2nd hidden layer: 5000 weights and 50 biases) and you probably initialized weights and biases randomly. Usually random initialization gets you somewhere near 50% accuracy, so I am a little confused about why you were getting close to 0% (I assume that with 7125 training examples, your error metric of error=7124.990 means you were classifying them all incorrectly). In any case, with such a high learning rate what happens is you take a given weight that is very wrong and move it right past its optimum so that it is still almost as wrong as before but is now sitting on the opposite side of the error curve. It makes sense that a model with vastly more parameters bouncing between decidedly wrong values would take more training epochs to see any improvement in error.

One more thing to consider: since you have set this up as a binary classification problem, you need only one output neuron, not two. In your training data labels, you can have 0 encode 7 while 1 encodes 9. Set a single sigmoid ouput neuron that gives a value between 0 and 1 and threshold your prediction on 0.5.

if ouput < 0.5:
    prediction = 7
else:
    prediction = 9

Error is not decreasing with epoch in neural network

Question

2 answers

solution1
2 2017-10-22 10:12:13

solution2
0 2017-10-22 17:04:40

Error is not decreasing with epoch in neural network

Question

2 answers

solution1 2 2017-10-22 10:12:13

solution2 0 2017-10-22 17:04:40

solution1
2 2017-10-22 10:12:13

solution2
0 2017-10-22 17:04:40