Accuracy of LibSVM decreases

Question

After getting my testlabel and trainlabel, i implemented SVM on libsvm and i got an accuracy of 97.4359%. ( c= 1 and g = 0.00375)

model = svmtrain(TrainLabel, TrainVec, '-c 1 -g 0.00375');
[predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model);

After i find the best c and g,

bestcv = 0;
for log2c = -1:3,
  for log2g = -4:1,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(TrainLabel,TrainVec, cmd);
    if (cv >= bestcv),
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
    fprintf('%g %g %g (best c=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv);
  end
end

c = 8 and g = 0.125

I implement the model again:

 model = svmtrain(TrainLabel, TrainVec, '-c 8 -g 0.125');
[predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model);

I get an accuracy of 82.0513%

How is it possible for the accuracy to decrease? shouldn't it increase? Or am i making any mistake?

Answer 1

The accuracies that you were getting during parameter tuning are biased upwards because you were predicting the same data that you were training. This is often fine for parameter tuning.

However, if you wanted those accuracies to be accurate estimates of the true generalization error on your final test set, then you have to add an additional wrap of cross validation or other resampling scheme.

Here is a very clear paper that outlines the general issue (but in a similar context of feature selection): http://www.pnas.org/content/99/10/6562.abstract

EDIT :

I usually add cross validation like:

n     = 95 % total number of observations
nfold = 10 % desired number of folds

% Set up CV folds
inds = repmat(1:nfold, 1, mod(nfold, n))
inds = inds(randperm(n))

% Loop over folds
for i = 1:nfold
  datapart = data(inds ~= i, :)

  % do some stuff

  % save results
end

% combine results

Answer 2

To do cross validation, you are supposed to split your training data. Here you test on training data to find your best set of parameter. That is not a good measure. You should use the following pseudo code:

for param = set of parameter to test
  [trainTrain,trainVal] = randomly split (trainSet); %%% you can repeat that several times and take the mean accuracy
  model = svmtrain(trainTrain, param);
  acc = svmpredict(trainVal, model);
  if accuracy is the best
     bestPAram = param
  end
end

Accuracy of LibSVM decreases

Question

2 answers

solution1
4 ACCPTED 2012-01-20 17:46:22

solution2
1 2012-01-20 17:46:28

Accuracy of LibSVM decreases

Question

2 answers

solution1 4 ACCPTED 2012-01-20 17:46:22

solution2 1 2012-01-20 17:46:28

solution1
4 ACCPTED 2012-01-20 17:46:22

solution2
1 2012-01-20 17:46:28