train the network with matlab matconvnet

Question

I want to train my network using matlab and matconvnet-1.0-beta25. My problem is regression and I use pdist as loss function to get mse. The inputs data is 56*56*64*6000 and the targets data is 56*56*64*6000 and network architecture is as follows:

opts.networkType = 'simplenn' ;
opts = vl_argparse(opts, varargin) ;

lr = [.01 2] ;

% Define network CIFAR10-quick
net.layers = {} ;

% Block 1
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.01*randn(5,5,64,32, 'single'), zeros(1, 32, 'single')}}, ...
                           'learningRate', lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.05*randn(5,5,32,16, 'single'), zeros(1,16,'single')}}, ...
                           'learningRate', .1*lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.01*randn(5,5,16,8, 'single'), zeros(1, 8, 'single')}}, ...
                           'learningRate', lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.05*randn(5,5,8,16, 'single'), zeros(1,16,'single')}}, ...
                           'learningRate', .1*lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.01*randn(5,5,16,32, 'single'), zeros(1, 32, 'single')}}, ...
                           'learningRate', lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
                           'weights', {{0.05*randn(5,5,32,64, 'single'), zeros(1,64,'single')}}, ...
                           'learningRate', .1*lr, ...
                           'stride', 1, ...
                           'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
% Loss layer
net.layers{end+1} = struct('type', 'pdist') ;

% Meta parameters
net.meta.inputSize = [56 56 64] ;
net.meta.trainOpts.learningRate = [0.0005*ones(1,30) 0.0005*ones(1,10) 0.0005*ones(1,5)] ;
net.meta.trainOpts.weightDecay = 0.0001 ;
net.meta.trainOpts.batchSize = 100 ;
net.meta.trainOpts.numEpochs = numel(net.meta.trainOpts.learningRate) ;

% Fill in default values

net = vl_simplenn_tidy(net) ;

I change getSimpleNNBatch(imdb, batch) function in ncnn_train (the name of mine) as follows:

function [images, labels] = getSimpleNNBatch(imdb, batch)
    images = imdb.images.data(:,:,:,batch) ;
    labels = imdb.images.labels(:,:,:,batch) ;
    if rand > 0.5, images=fliplr(images) ; 
end

because my label is multi-dimensional. Also I change errorFunction in cnn_train from multiclasses to none :

opts.errorFunction = 'none' ;

and change the error variable from:

% accumulate errors
error = sum([error, [...
  sum(double(gather(res(end).x))) ;
  reshape(params.errorFunction(params, labels, res),[],1) ; ]],2) ;

to:

% accumulate errors
error = sum([error, [...
  mean(mean(mean(double(gather(res(end).x))))) ;
  reshape(params.errorFunction(params, labels, res),[],1) ; ]],2) ;

My first question is why the res(end).x third dimension in above command is one instead of 64? this is 56*56*1*100 (100 is the batch).

Have I made a mistake?

here is the results:

train: epoch 01:   2/ 40: 10.1 (27.0) Hz objective: 21360.722
train: epoch 01:   3/ 40: 13.0 (30.0) Hz objective: 67328685.873
...
train: epoch 01:  39/ 40: 29.7 (29.6) Hz objective: 5179175.587
train: epoch 01:  40/ 40: 29.8 (30.6) Hz objective: 5049697.440
val: epoch 01:   1/ 10: 87.3 (87.3) Hz objective: 49.512
val: epoch 01:   2/ 10: 88.9 (90.5) Hz objective: 50.012
...
val: epoch 01:   9/ 10: 88.2 (88.2) Hz objective: 49.936
val: epoch 01:  10/ 10: 88.1 (87.3) Hz objective: 49.962
train: epoch 02:   1/ 40: 30.2 (30.2) Hz objective: 49.650
train: epoch 02:   2/ 40: 30.3 (30.4) Hz objective: 49.704
...
train: epoch 02:  39/ 40: 30.2 (31.6) Hz objective: 49.739
train: epoch 02:  40/ 40: 30.3 (31.0) Hz objective: 49.722
val: epoch 02:   1/ 10: 91.8 (91.8) Hz objective: 49.687
val: epoch 02:   2/ 10: 92.0 (92.2) Hz objective: 49.831
...
val: epoch 02:   9/ 10: 92.0 (88.5) Hz objective: 49.931
val: epoch 02:  10/ 10: 91.9 (91.1) Hz objective: 49.962
train: epoch 03:   1/ 40: 31.7 (31.7) Hz objective: 49.014
train: epoch 03:   2/ 40: 31.2 (30.8) Hz objective: 49.237
...

here is my network schema image

Answer 1

Two inputs of pdist have got nxmx64x100 size as below and as this mentioned, the output of pdist has got the same height and width, but depth equal to one. About the correctness of error definition, you should debug and check the size and definition accurately.

train the network with matlab matconvnet

Question

1 answers

solution1
0 2018-01-15 16:15:53

train the network with matlab matconvnet

Question

1 answers

solution1 0 2018-01-15 16:15:53

solution1
0 2018-01-15 16:15:53