简体   繁体   中英

Matlab neural network handwritten digit recognition, output going to indifference

Using Matlab I am trying to construct a neural network that can classify handwritten digits that are 30x30 pixels. I use backpropagation to find the correct weights and biases. The network starts with 900 inputs, then has 2 hidden layers with 16 neurons and it ends with 10 outputs. Each output neuron has a value between 0 and 1 that represents the belief that the input should be classified as a certain digit. The problem is that after training, the output becomes almost indifferent to the input and it goes towards a uniform belief of 0.1 for each output.

My approach is to take each image with 30x30 pixels and reshape it to be vectors of 900x1 (note that 'Images_vector' is already in the vector format when it is loaded). The weights and biases are initiated with random values between 0 and 1. I am using stochastic gradiënt descent to update the weights and biases with 10 randomly selected samples per batch. The equations are as described by Nielsen .

The script is as follows.

%% Inputs
numberofbatches = 1000;
batchsize = 10;
alpha = 1;
cutoff = 8000;
layers = [900 16 16 10];

%% Initialization
rng(0);

load('Images_vector')
Images_vector = reshape(Images_vector', 1, 10000);
labels = [ones(1,1000) 2*ones(1,1000) 3*ones(1,1000) 4*ones(1,1000) 5*ones(1,1000) 6*ones(1,1000) 7*ones(1,1000) 8*ones(1,1000) 9*ones(1,1000) 10*ones(1,1000)];
newOrder = randperm(10000);
Images_vector = Images_vector(newOrder);
labels = labels(newOrder);
images_training = Images_vector(1:cutoff);
images_testing = Images_vector(cutoff + 1:10000);

w = cell(1,length(layers) - 1);
b = cell(1,length(layers));
dCdw = cell(1,length(layers) - 1);
dCdb = cell(1,length(layers));
for i = 1:length(layers) - 1
    w{i} = rand(layers(i+1),layers(i));
    b{i+1} = rand(layers(i+1),1);
end

%% Learning process
batches = randi([1 cutoff - batchsize],1,numberofbatches);

cost = zeros(numberofbatches,1);
c = 1;
for batch = batches
    for i = 1:length(layers) - 1
        dCdw{i} = zeros(layers(i+1),layers(i));
        dCdb{i+1} = zeros(layers(i+1),1);
    end

    for n = batch:batch+batchsize
        y = zeros(10,1);
        disp(labels(n))
        y(labels(n)) = 1;

        % Network
        a{1} = images_training{n};
        z{2} = w{1} * a{1} + b{2};
        a{2} = sigmoid(0, z{2});
        z{3} = w{2} * a{2} + b{3};
        a{3} = sigmoid(0, z{3});
        z{4} = w{3} * a{3} + b{4};
        a{4} = sigmoid(0, z{4});

        % Cost
        cost(c) = sum((a{4} - y).^2) / 2;

        % Gradient
        d{4} = (a{4} - y) .* sigmoid(1, z{4});
        d{3} = (w{3}' * d{4}) .* sigmoid(1, z{3});
        d{2} = (w{2}' * d{3}) .* sigmoid(1, z{2});

        dCdb{4} = dCdb{4} + d{4} / 10;
        dCdb{3} = dCdb{3} + d{3} / 10;
        dCdb{2} = dCdb{2} + d{2} / 10;

        dCdw{3} = dCdw{3} + (a{3} * d{4}')' / 10;
        dCdw{2} = dCdw{2} + (a{2} * d{3}')' / 10;
        dCdw{1} = dCdw{1} + (a{1} * d{2}')' / 10;

        c = c + 1;
    end

    % Adjustment
    b{4} = b{4} - dCdb{4} * alpha;
    b{3} = b{3} - dCdb{3} * alpha;
    b{2} = b{2} - dCdb{2} * alpha;
    w{3} = w{3} - dCdw{3} * alpha;
    w{2} = w{2} - dCdw{2} * alpha;
    w{1} = w{1} - dCdw{1} * alpha;
end

figure
plot(cost)
ylabel 'Cost'
xlabel 'Batches trained on'

With the sigmoid function being the following.

function y = sigmoid(derivative, x)

if derivative == 0
    y = 1 ./ (1 + exp(-x));
else
    y = sigmoid(0, x) .* (1 - sigmoid(0, x));
end

end

Other than this I have also tried to have 1 of each digit in each batch, but this gave the same result. Also I have tried varying the batch size, the number of batches and alpha, but with no success.

Does anyone know what I am doing wrong?

Correct me if I'm wrong: You have 10000 samples in you're data, which you divide into 1000 batches of 10 samples. Your training process consists of running over these 10000 samples once.

This might be too little, normally your training process consists of several epochs (one epoch = iterating over every sample once). You can try going over your batches multiple times.

Also for 900 inputs your network seems small. Try it with more neurons in the second layer. Hope it helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM