简体   繁体   中英

Learning XOR with deep neural network

I am novice to deep learning so I begin with the simplest test case: XOR learning.

In the new edition of Digital Image Processing by G & W the authors give an example of XOR learning by a deep net with 3 layers: input, hidden and output (each layer has 2 neurons.), and a sigmoid as the network activation function.

For network initailization they say: "We used alpha = 1.0, an inital set of Gaussian random weights of zero mean and standard deviation of 0.02 " (alpha is the gradient descent learning rate). Training was made with 4 labeled examples:

X = [1 -1 -1 1;1 -1 1 -1];%MATLAB syntax
R = [1 1 0 0;0 0 1 1];%Labels

I have written the following MATLAB code to implement the network learing process:

function output = neuralNet4e(input,specs)


NumPat = size(input.X,2);%Number of patterns
NumLayers = length(specs.W);
for kEpoch = 1:specs.NumEpochs

    % forward pass

    A = cell(NumLayers,1);%Output of each neuron in each layer
    derZ = cell(NumLayers,1);%Activation function derivative on each neuron dot product 
    A{1} = input.X;

    for kLayer = 2:NumLayers

       B = repmat(specs.b{kLayer},1,NumPat);
       Z = specs.W{kLayer} * A{kLayer - 1} + B;
       derZ{kLayer} = specs.activationFuncDerive(Z);
       A{kLayer} = specs.activationFunc(Z);

    end

    % backprop

    D =  cell(NumLayers,1);
    D{NumLayers} = (A{NumLayers} - input.R).* derZ{NumLayers};
    for kLayer = (NumLayers-1):-1:2

        D{kLayer} = (specs.W{kLayer + 1}' * D{kLayer + 1}).*derZ{kLayer};

    end

    %Update weights and biases

    for kLayer = 2:NumLayers

        specs.W{kLayer} = specs.W{kLayer} - specs.alpha * D{kLayer} * A{kLayer - 1}' ;
        specs.b{kLayer} = specs.b{kLayer} - specs.alpha * sum(D{kLayer},2);

    end

end

output.A = A;

end

Now, when I am using their setup (ie, weights initalizaion with std = 0.02)

clearvars
s = 0.02;
input.X = [1 -1 -1 1;1 -1 1 -1];
input.R = [1 1 0 0;0 0 1 1];
specs.W = {[];s * randn(2,2);s * randn(2,2)};
specs.b = {[];s * randn(2,1);s * randn(2,1)};
specs.activationFunc = @(x) 1./(1 + exp(-x));
specs.activationFuncDerive = @(x) exp(-x)./(1 + exp(-x)).^2;
specs.NumEpochs = 1e4;
specs.alpha = 1;
output = neuralNet4e(input,specs);

I'm getting (after 10000 epoches) that the final output of the net is output.A{3} = [0.5 0.5 0.5 0.5;0.5 0.5 0.5 0.5]

but when I changed s = 0.02; to s = 1; I got output.A{3} = [0.989 0.987 0.010 0.010;0.010 0.012 0.0.98 0.98] as it should.

Is it possible to get these results with `s=0.02;' and I am doing something wrong in my code? or is standard deviation of 0.02 is just a typo?

Based on your code, I don't see any errors. In my knowledge, the result that you got,

[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]

That is a typical result of overfitting. There are many reasons for this to happen, such as too many epochs, too large learning rate, too small sample data, and others.

On your example, s=0.02 limits the values of randomized weights and biases. Changing that to s=1 makes the randomized values unchanged/unscaled.

To make the s=0.02 one work, you can try minimizing the number of epochs or maybe lowering the alpha.

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM