I have a dataset of images of numbers(with only 2 class, 7 and 9). So, its a binary classification problem. Input image is 28*28 features. So, I am using neural network with 784 neurons in input layer. 100 and 50 neurons in hidden layers. 2 neurons in output layer. Using learning rate as 0.3.
My question is Why the error is not decreasing with epoch? Am I doing something wrong ? I have 7125 samples in train dataset.
>epoch=0, lrate=0.300, error=7124.996
>epoch=1, lrate=0.300, error=7124.996
>epoch=2, lrate=0.300, error=7124.996
>epoch=3, lrate=0.300, error=7124.996
>epoch=4, lrate=0.300, error=7124.995
>epoch=5, lrate=0.300, error=7124.995
>epoch=6, lrate=0.300, error=7124.995
>epoch=7, lrate=0.300, error=7124.995
>epoch=8, lrate=0.300, error=7124.995
>epoch=9, lrate=0.300, error=7124.995
>epoch=10, lrate=0.300, error=7124.995
>epoch=11, lrate=0.300, error=7124.994
>epoch=12, lrate=0.300, error=7124.994
>epoch=13, lrate=0.300, error=7124.994
>epoch=14, lrate=0.300, error=7124.994
>epoch=15, lrate=0.300, error=7124.994
>epoch=16, lrate=0.300, error=7124.993
>epoch=17, lrate=0.300, error=7124.993
>epoch=18, lrate=0.300, error=7124.993
>epoch=19, lrate=0.300, error=7124.992
>epoch=20, lrate=0.300, error=7124.992
>epoch=21, lrate=0.300, error=7124.992
>epoch=22, lrate=0.300, error=7124.991
>epoch=23, lrate=0.300, error=7124.991
>epoch=24, lrate=0.300, error=7124.990
>epoch=25, lrate=0.300, error=7124.989
>epoch=26, lrate=0.300, error=7124.989
>epoch=27, lrate=0.300, error=7124.988
>epoch=28, lrate=0.300, error=7124.987
>epoch=29, lrate=0.300, error=7124.985
>epoch=30, lrate=0.300, error=7124.984
>epoch=31, lrate=0.300, error=7124.982
>epoch=32, lrate=0.300, error=7124.980
>epoch=33, lrate=0.300, error=7124.977
>epoch=34, lrate=0.300, error=7124.972
>epoch=35, lrate=0.300, error=7124.966
>epoch=36, lrate=0.300, error=7124.957
>epoch=37, lrate=0.300, error=7124.940
>epoch=38, lrate=0.300, error=7124.899
>epoch=39, lrate=0.300, error=7124.544
>epoch=40, lrate=0.300, error=6322.611
>epoch=41, lrate=0.300, error=5425.721
>epoch=42, lrate=0.300, error=4852.422
>epoch=43, lrate=0.300, error=4384.062
>epoch=44, lrate=0.300, error=4204.247
>epoch=45, lrate=0.300, error=4091.508
>epoch=46, lrate=0.300, error=4030.757
>epoch=47, lrate=0.300, error=4014.341
>epoch=48, lrate=0.300, error=3999.759
>epoch=49, lrate=0.300, error=4008.330
>epoch=50, lrate=0.300, error=3995.592
>epoch=51, lrate=0.300, error=3964.337
>epoch=52, lrate=0.300, error=3952.369
>epoch=53, lrate=0.300, error=3965.271
>epoch=54, lrate=0.300, error=3989.814
>epoch=55, lrate=0.300, error=3972.481
>epoch=56, lrate=0.300, error=3937.723
>epoch=57, lrate=0.300, error=3917.152
>epoch=58, lrate=0.300, error=3901.988
>epoch=59, lrate=0.300, error=3920.768
If I change the neurons in hidden layers (5 + 2). I am getting better result. Why is it so?
>epoch=0, lrate=0.300, error=4634.128, l_rate=0.300
>epoch=1, lrate=0.300, error=4561.231, l_rate=0.300
>epoch=2, lrate=0.300, error=3430.602, l_rate=0.300
>epoch=3, lrate=0.300, error=927.599, l_rate=0.300
>epoch=4, lrate=0.300, error=843.441, l_rate=0.300
>epoch=5, lrate=0.300, error=741.719, l_rate=0.300
>epoch=6, lrate=0.300, error=734.094, l_rate=0.300
>epoch=7, lrate=0.300, error=691.922, l_rate=0.300
>epoch=8, lrate=0.300, error=705.822, l_rate=0.300
>epoch=9, lrate=0.300, error=629.065, l_rate=0.300
>epoch=10, lrate=0.300, error=588.232, l_rate=0.300
>epoch=11, lrate=0.300, error=592.619, l_rate=0.300
>epoch=12, lrate=0.300, error=554.380, l_rate=0.300
>epoch=13, lrate=0.300, error=555.677, l_rate=0.300
>epoch=14, lrate=0.300, error=555.798, l_rate=0.300
>epoch=15, lrate=0.300, error=523.214, l_rate=0.300
>epoch=16, lrate=0.300, error=530.260, l_rate=0.300
>epoch=17, lrate=0.300, error=491.709, l_rate=0.300
>epoch=18, lrate=0.300, error=469.119, l_rate=0.300
>epoch=19, lrate=0.300, error=472.025, l_rate=0.300
>epoch=20, lrate=0.300, error=473.940, l_rate=0.300
>epoch=21, lrate=0.300, error=438.288, l_rate=0.300
>epoch=22, lrate=0.300, error=412.492, l_rate=0.300
>epoch=23, lrate=0.300, error=424.129, l_rate=0.300
>epoch=24, lrate=0.300, error=427.414, l_rate=0.300
>epoch=25, lrate=0.300, error=435.418, l_rate=0.300
>epoch=26, lrate=0.300, error=406.067, l_rate=0.300
>epoch=27, lrate=0.300, error=411.439, l_rate=0.300
>epoch=28, lrate=0.300, error=373.220, l_rate=0.300
>epoch=29, lrate=0.300, error=381.987, l_rate=0.300
>epoch=30, lrate=0.300, error=359.585, l_rate=0.300
>epoch=31, lrate=0.300, error=368.407, l_rate=0.300
>epoch=32, lrate=0.300, error=351.560, l_rate=0.300
>epoch=33, lrate=0.300, error=359.028, l_rate=0.300
>epoch=34, lrate=0.300, error=371.987, l_rate=0.300
>epoch=35, lrate=0.300, error=336.106, l_rate=0.300
>epoch=36, lrate=0.300, error=318.453, l_rate=0.300
That's a common problem; it's usually caused by the design of your NN and/or data. To avoid that, you may need to reconsider your designing decisions.
From your question, it seems like you decided to go for a specific design that if changed could help a lot. Let's have a look:
Your first model had many more parameters (1st hidden layer: 78,400 weights and 100 biases, 2nd hidden layer: 5000 weights and 50 biases) and you probably initialized weights and biases randomly. Usually random initialization gets you somewhere near 50% accuracy, so I am a little confused about why you were getting close to 0% (I assume that with 7125 training examples, your error metric of error=7124.990
means you were classifying them all incorrectly). In any case, with such a high learning rate what happens is you take a given weight that is very wrong and move it right past its optimum so that it is still almost as wrong as before but is now sitting on the opposite side of the error curve. It makes sense that a model with vastly more parameters bouncing between decidedly wrong values would take more training epochs to see any improvement in error.
One more thing to consider: since you have set this up as a binary classification problem, you need only one output neuron, not two. In your training data labels, you can have 0 encode 7 while 1 encodes 9. Set a single sigmoid ouput neuron that gives a value between 0 and 1 and threshold your prediction on 0.5.
if ouput < 0.5:
prediction = 7
else:
prediction = 9
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.