简体   繁体   English

SVM + matlab和libsvm的准确度非常低

[英]Really low accuracy with SVM + matlab and libsvm

I cannot figure the low accuracy obtained by running svm on speech vectors. 我无法想象通过在语音向量上运行svm获得的低精度。 I have cross verified that the data is not wrong and have even used a naives bayes classifier on it to get good results. 我已经交叉验证数据没有错,甚至在它上面使用了一个naives bayes分类器来获得良好的结果。

First of all I should mention that I have verified that I am not using the same files for training and testing. 首先,我应该提一下,我已经确认我没有使用相同的文件进行培训和测试。

I have a set of data for positive and negative class that I use for training 我有一组用于培训的正面和负面类的数据

  pos = ones(size(PositiveTraining,1),1);
  neg = ones(size(NegativeTraining,1),1)*-1;



  Training = [ PositiveTraining ; NegativeTraining ];
  TrainingLabels = [pos;neg];

  model = svmtrain( TrainingLabels , Training, '-t 0' );

After obtaining the model I am testing the vectors using the following code 在获得模型后,我使用以下代码测试向量

testing_label_vector = ones(size(mfcc,1),1); % where mfcc is my testing matrix
[predicted_label, a, b ] = svmpredict(testing_label_vector, File.mfcc, model );
edges = [-1,1];
hist = histc( predicted_label , edges )

However I find accuracy ranging from 0% to 13% maximum. 但是我发现最大精度在0%到13%之间。

Is there any thing that I am doing wrong ? 有什么事我做错了吗?

Assuming the data is correct can somebody suggest me how I can increase the accuracy of the classifier ? 假设数据是正确的,有人可以建议我如何提高分类器的准确性吗?

You need to do parameter selection - you are just using the default parameters. 您需要进行参数选择 - 您只需使用默认参数。 SVM is very sensitive to its parameters. SVM对其参数非常敏感。 The linear kernel has no parameters, but you still have the penalty parameter C. This parameter trades off between a larger margin and misclassified training points. 线性内核没有参数,但您仍然有惩罚参数C.此参数在较大的边距和错误分类的训练点之间进行折衷。 Larger C will mean that the classifier will try to classify all the training points correctly, but this may not generalize well. 较大的C意味着分类器将尝试正确地对所有训练点进行分类,但这可能不会很好地概括。 Smaller C will allow some points to be misclassified to provide a model that is less sensitive to noise. 较小的C将允许某些点被错误分类以提供对噪声不太敏感的模型。 The value of C is different for every dataset, as it depends greatly on the scaling and distribution, etc. It's also possible that your dataset is not linearly separable, even for low values of C, so perhaps a non-linear kernel would work better, such as the RBF kernel, which is popular. C的值对于每个数据集都是不同的,因为它在很大程度上取决于缩放和分布等。您的数据集也可能不是线性可分的,即使对于较低的C值,所以非线性内核可能会更好地工作比如RBF内核,这很受欢迎。 However, keep in mind those kernels have more parameters and they have to be tuned to work well. 但是,请记住,这些内核有更多参数,必须调整它们才能正常工作。

Read the guide written by the authors of libsvm, it talks about how to do parameter selection and gives other practical tips for using SVM for classification. 阅读libsvm作者撰写的指南,它讨论了如何进行参数选择,并提供了使用SVM进行分类的其他实用技巧。

A Practical Guide to Support Vector Classication by Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin Chih-Wei Hsu,Chih-Chung Chang和Chih-Jen Lin 支持矢量分类的实用指南

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM