简体   繁体   English

Matlab神经网络手写数字识别,输出无差异

[英]Matlab neural network handwritten digit recognition, output going to indifference

Using Matlab I am trying to construct a neural network that can classify handwritten digits that are 30x30 pixels. 我正在尝试使用Matlab构建一个神经网络,该网络可以对30x30像素的手写数字进行分类。 I use backpropagation to find the correct weights and biases. 我使用反向传播来找到正确的权重和偏差。 The network starts with 900 inputs, then has 2 hidden layers with 16 neurons and it ends with 10 outputs. 网络以900个输入开始,然后有2个包含16个神经元的隐藏层,最后以10个输出结束。 Each output neuron has a value between 0 and 1 that represents the belief that the input should be classified as a certain digit. 每个输出神经元的值都在0到1之间,表示认为应该将输入分类为某个数字。 The problem is that after training, the output becomes almost indifferent to the input and it goes towards a uniform belief of 0.1 for each output. 问题在于,经过训练后,输出对输入几乎变得无动于衷,并且每个输出的统一置信度为0.1。

My approach is to take each image with 30x30 pixels and reshape it to be vectors of 900x1 (note that 'Images_vector' is already in the vector format when it is loaded). 我的方法是拍摄30x30像素的每幅图像,并将其整形为900x1的矢量(请注意,“ Images_vector”在加载时已处于矢量格式)。 The weights and biases are initiated with random values between 0 and 1. I am using stochastic gradiënt descent to update the weights and biases with 10 randomly selected samples per batch. 权重和偏差是从0到1之间的随机值开始的。我正在使用随机梯度下降来更新权重和偏差,每批10个随机选择的样本。 The equations are as described by Nielsen . 这些方程式如Nielsen所述

The script is as follows. 脚本如下。

%% Inputs
numberofbatches = 1000;
batchsize = 10;
alpha = 1;
cutoff = 8000;
layers = [900 16 16 10];

%% Initialization
rng(0);

load('Images_vector')
Images_vector = reshape(Images_vector', 1, 10000);
labels = [ones(1,1000) 2*ones(1,1000) 3*ones(1,1000) 4*ones(1,1000) 5*ones(1,1000) 6*ones(1,1000) 7*ones(1,1000) 8*ones(1,1000) 9*ones(1,1000) 10*ones(1,1000)];
newOrder = randperm(10000);
Images_vector = Images_vector(newOrder);
labels = labels(newOrder);
images_training = Images_vector(1:cutoff);
images_testing = Images_vector(cutoff + 1:10000);

w = cell(1,length(layers) - 1);
b = cell(1,length(layers));
dCdw = cell(1,length(layers) - 1);
dCdb = cell(1,length(layers));
for i = 1:length(layers) - 1
    w{i} = rand(layers(i+1),layers(i));
    b{i+1} = rand(layers(i+1),1);
end

%% Learning process
batches = randi([1 cutoff - batchsize],1,numberofbatches);

cost = zeros(numberofbatches,1);
c = 1;
for batch = batches
    for i = 1:length(layers) - 1
        dCdw{i} = zeros(layers(i+1),layers(i));
        dCdb{i+1} = zeros(layers(i+1),1);
    end

    for n = batch:batch+batchsize
        y = zeros(10,1);
        disp(labels(n))
        y(labels(n)) = 1;

        % Network
        a{1} = images_training{n};
        z{2} = w{1} * a{1} + b{2};
        a{2} = sigmoid(0, z{2});
        z{3} = w{2} * a{2} + b{3};
        a{3} = sigmoid(0, z{3});
        z{4} = w{3} * a{3} + b{4};
        a{4} = sigmoid(0, z{4});

        % Cost
        cost(c) = sum((a{4} - y).^2) / 2;

        % Gradient
        d{4} = (a{4} - y) .* sigmoid(1, z{4});
        d{3} = (w{3}' * d{4}) .* sigmoid(1, z{3});
        d{2} = (w{2}' * d{3}) .* sigmoid(1, z{2});

        dCdb{4} = dCdb{4} + d{4} / 10;
        dCdb{3} = dCdb{3} + d{3} / 10;
        dCdb{2} = dCdb{2} + d{2} / 10;

        dCdw{3} = dCdw{3} + (a{3} * d{4}')' / 10;
        dCdw{2} = dCdw{2} + (a{2} * d{3}')' / 10;
        dCdw{1} = dCdw{1} + (a{1} * d{2}')' / 10;

        c = c + 1;
    end

    % Adjustment
    b{4} = b{4} - dCdb{4} * alpha;
    b{3} = b{3} - dCdb{3} * alpha;
    b{2} = b{2} - dCdb{2} * alpha;
    w{3} = w{3} - dCdw{3} * alpha;
    w{2} = w{2} - dCdw{2} * alpha;
    w{1} = w{1} - dCdw{1} * alpha;
end

figure
plot(cost)
ylabel 'Cost'
xlabel 'Batches trained on'

With the sigmoid function being the following. S形函数如下。

function y = sigmoid(derivative, x)

if derivative == 0
    y = 1 ./ (1 + exp(-x));
else
    y = sigmoid(0, x) .* (1 - sigmoid(0, x));
end

end

Other than this I have also tried to have 1 of each digit in each batch, but this gave the same result. 除此之外,我还尝试在每个批次中使每个数字中有1个,但这给出了相同的结果。 Also I have tried varying the batch size, the number of batches and alpha, but with no success. 我也尝试过更改批次大小,批次数和alpha,但是没有成功。

Does anyone know what I am doing wrong? 有人知道我在做什么错吗?

Correct me if I'm wrong: You have 10000 samples in you're data, which you divide into 1000 batches of 10 samples. 如果我错了,请纠正我:您的数据中有10000个样本,您将其分为1000个批次(共10个样本)。 Your training process consists of running over these 10000 samples once. 您的训练过程包括对这10000个样本运行一次。

This might be too little, normally your training process consists of several epochs (one epoch = iterating over every sample once). 这可能太少了,通常您的训练过程包括几个时期(一个时期=每个样本重复一次)。 You can try going over your batches multiple times. 您可以尝试多次处理批次。

Also for 900 inputs your network seems small. 同样对于900个输入,您的网络似乎很小。 Try it with more neurons in the second layer. 尝试在第二层中使用更多的神经元。 Hope it helps! 希望能帮助到你!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM