[英]LibSVM prediction accuracy
I am currently trying to run LibSVM
located here: https://www.csie.ntu.edu.tw/~cjlin/libsvm 我目前正在尝试运行位于以下位置的
LibSVM
: https : //www.csie.ntu.edu.tw/~cjlin/libsvm
I only have access to MATLAB 2011b. 我只能访问MATLAB 2011b。 When I try to run the example data file (heartscale) included with the
LibSVM
package with different C
and gamma
values I get the same accuracy results. 当我尝试使用不同的
C
和gamma
值运行LibSVM
软件包随附的示例数据文件(心秤)时,我得到的精度结果相同。
This happens for other data sets as well. 其他数据集也会发生这种情况。
I build a for
loop and loop through the different C
and gamma
values and the accuracy %'s do not change. 我建立了一个
for
循环,并遍历了不同的C
和gamma
值,并且准确性%不变。
I am doing this to find the best C
and gamma
to use for the data set (cross-validation) as recommended in the documentation "A Practical Guide to Support Vector Classification" located on the above website. 我这样做是为了找到上述网站上“支持向量分类的实用指南”文档中推荐的用于数据集(交叉验证)的最佳
C
和gamma
。
When I look at the accuracy_mat
that I build below, the values are all the same. 当我看着
accuracy_mat
我下面建,值都相同。 Even the outputs from svmpredict
are the same. 甚至
svmpredict
的输出都是相同的。
I have read through the documentation multiple times and looked at the FAQ on the website and would appreciate inputs on this from SVM-practitioners. 我已多次阅读该文档,并查看了网站上的FAQ,并且希望SVM从业人员对此提供宝贵的意见。
[heart_scale_label, heart_scale_inst] = libsvmread( 'heartscale' );
C = { '2^-5','2^-3','2^-1'};
g = {'2^-15','2^-3','2^-1'};
accuracy_mat = zeros( length( g ), length( c ) );
data_num = length( heart_scale_inst(:,1) );
t = zeros( data_num, 1 );
for i = 1:length( g )
for j = 1:length( C )
c_train_inputs = ['-c ', C{j}];
g_train_inputs = ['-g ', g{i}];
c_and_g_inputs = [c_train_inputs, g_train_inputs];
model = svmtrain( heart_scale_label, ...
heart_scale_inst, ...
[c_and_g_inputs, '-b 1'] ...
);
[predict_label, ...
accuracy, ...
prob_estimates] = svmpredict( heart_scale_label, ...
heart_scale_inst, ...
model, ...
'-b 1' ...
);
accuracy_mat(i,j) = max( accuracy );
end
end
[C,gamma]
hyper-parameters locked you in a corner [C,gamma]
超参数的初始范围将您锁定在角落 Support Vector methods are very powerful engines. 支持向量方法是非常强大的引擎。
Still, one may still destroy their cool predictive powers, either by a poor data-sanitization ( regularisation, NaN removal etc. ) or by sending an order to use a corner-case hyperparameter(s) C
or gamma
. 尽管如此,通过不良的数据卫生处理(正则化,去除NaN等)或通过发送使用极端情况下的超参数
C
或gamma
的命令,仍然可能破坏其出色的预测能力。
Before a SVM/SVC engines are put into hard work, the more if a brute-force hyper-parameter space search is planned to be run ( GridSearchCV
et al ), where CPUs/GPUs may easily spend more than hundreds of hours, a simple rule of thumb ought to be use for a pre-validation of the search-space. 在计划将SVM / SVC引擎投入使用之前,如果计划运行蛮力超参数空间搜索(
GridSearchCV
等人),在这种情况下,CPU / GPU可能很容易花费数百小时以上,这将变得更加简单。经验法则应用于搜索空间的预验证。
Andreas Mueller has put that nicely to first scan SVM rbf
's ( while the pre-scan idea is valid in general, not only for an rbf
model ) over a "rule-of-thumb" range of values: Andreas Mueller将其很好地放在 “经验法则”值范围内, 即首先扫描SVM
rbf
(虽然预扫描的想法通常是有效的,不仅对于rbf
模型有效):
{'C': np.logspace(-3, 2, 6), 'gamma': np.logspace(-3, 2, 6)}
ie unless you are pretty sure ( or forbidden by some untold restriction ) to use just your ultra-low learning parametrisation values preset in [ C, gamma ]
- search space, you might allow to relax the range so as to allow the SVM-learner progress to some other results farther from the observed corner it has got locked so far. 即,除非您非常确定(或受某些无法理解的限制禁止)仅使用在
[ C, gamma ]
-搜索空间中预设的超低学习参数值,否则您可以放宽范围以允许SVM学习者到离观察到的角落更远的地方取得了一些其他结果,到目前为止,它已经被锁定了。
C = { 0.001, 0.01, 0.1, 1.0, 10.0, 100.0 }
g = { 0.001, 0.01, 0.1, 1.0, 10.0, 100.0 }
If you do not see any evolution of the SVM-learner results over this sand-box pre-test landscape of it's hyper-parameters, than the root cause would be hidden in DataSET
( which does not seem to be the case, as you posted an observation, that the same trouble appeared independently of one particular DataSET under review ). 如果您在此沙箱超参数的沙箱测试前环境中未看到SVM学习器结果的任何演变,则根本原因将隐藏在
DataSET
(如您所张贴的那样,情况似乎并非如此)观察发现,相同的问题独立于正在审查的特定DataSET出现)。
Nota bene: one might also rather test descriptive statistical values about the trained model: 注意:可能还需要测试有关训练模型的描述性统计值:
model_predictions_accuracy_mean(i,j) = mean( accuracy );
model_predictions_accuracy_var( i,j) = var( accuracy );
accuracy_mat( i,j) = max( accuracy ); %% MAX masks
%% quality-of-fit
%% & may "look" same
%% for the whole range
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.