我可以在 LIBSVM 的 svmtrain 中進行哪些修改以提高垃圾郵件分類器的准確性？

Question

我正在使用 Octave 版本 5.2.0 和 LIBSVM 3.24 來構建垃圾郵件分類器。 在不使用 LIBSVM 的情況下，我在測試和訓練數據上的准確率均 > 99%。 但是在使用 LIBSVM 時，我只能獲得 68-69% 的准確度。我應該對我的 LIBSVM 選項進行哪些修改？ 這是我使用的代碼

model = svmtrain(X, y,'-c 0.1 -t 2 -s 0 -g 1000');
p = svmpredict(y,X,model);

Answer 1

你知道 LibSVM 的設置嗎？

% libSVM options:
% -s svm_type: set type of SVM (default 0)
%   0 -- C-SVC
%   1 -- nu-SVC
%   2 -- one-class SVM
%   3 -- epsilon-SVR
%   4 -- nu-SVR
% -t kernel_type: set type of kernel function (default 2)
%   0 -- linear: u'*v
%   1 -- polynomial: (gamma*u'*v + coef0)^degree
%   2 -- radial basis function: exp(-gamma*|u-v|^2)
%   3 -- sigmoid: tanh(gamma*u'*v + coef0)
% -d degree: set degree in kernel function (default 3)
% -g gamma: set gamma in kernel function (default 1/num_features)
% -r coef0: set coef0 in kernel function (default 0)
% -c cost: set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
% -n nu: set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
% -p epsilon: set the epsilon in loss function of epsilon-SVR (default 0.1)
% -m cachesize: set cache memory size in MB (default 100)
% -e epsilon: set tolerance of termination criterion (default 0.001)
% -h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1)
% -b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
% -wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1)

因此，您的-s 0 -t 2 -g 1000 -c 0.1設置轉換為具有高斯 kernel ( -t 2 ) 且具有較大縮放比例 ( -g 1000 ) 且小於默認成本的 C-SVM ( -s 0 )對於違規行為（ -c 0.1 ）。

我建議先嘗試使用默認值（ -s 0 -t 2 ），然后增加成本-c 。 您的 gamma 看起來大得離譜，但在不知道您的數據的情況下，沒有人可以判斷這一點。 看看超參數優化，它准確地設置了這些值。 這方面有很多工作，但我只熟悉回歸分析。 如果有疑問，請通過gridsearch或ga對這些參數進行全局優化。

我可以在 LIBSVM 的 svmtrain 中進行哪些修改以提高垃圾郵件分類器的准確性？

問題描述

1 個解決方案

解決方案1
0 2020-04-28 06:22:24

我可以在 LIBSVM 的 svmtrain 中進行哪些修改以提高垃圾郵件分類器的准確性？

問題描述

1 個解決方案

解決方案1 0 2020-04-28 06:22:24

解決方案1
0 2020-04-28 06:22:24