[英]Help--100% accuracy with LibSVM?
Nominally a good problem to have, but I'm pretty sure it is because something funny is going on... 名义上是一个很好的问题,但我很确定这是因为有趣的东西正在发生......
As context, I'm working on a problem in the facial expression/recognition space, so getting 100% accuracy seems incredibly implausible (not that it would be plausible in most applications...). 作为背景,我正在处理面部表情/识别空间中的问题,因此获得100%的准确性似乎令人难以置信地难以置信(并非在大多数应用程序中都是合理的......)。 I'm guessing there is either some consistent bias in the data set that it making it overly easy for an SVM to pull out the answer, =or=, more likely, I've done something wrong on the SVM side.
我猜测数据集中存在一些一致的偏差,它使得SVM过于容易地得出答案,=或=,更可能的是,我在SVM方面做错了。
I'm looking for suggestions to help understand what is going on--is it me (=my usage of LibSVM)? 我正在寻找建议,以帮助了解发生了什么 - 是我(=我对LibSVM的使用)? Or is it the data?
还是数据?
The details: 细节:
Things tried: 事情尝试:
Tentative conclusion?: 暂定结论?:
Something with the data is wacked--somehow, within the data set, there is a subtle, experimenter-driven effect that the SVM is picking up on. 数据集的某些东西已经被摧毁 - 不知何故,在数据集中,SVM正在汲取一种微妙的,实验者驱动的效果。
(This doesn't, on first pass, explain why the RBF kernel gives garbage results, however.) (首先,这不解释为什么RBF内核会产生垃圾结果。)
Would greatly appreciate any suggestions on a) how to fix my usage of LibSVM (if that is actually the problem) or b) determine what subtle experimenter-bias in the data LibSVM is picking up on. 非常感谢任何建议:a)如何修复我对LibSVM的使用(如果这实际上是问题)或b)确定LibSVM数据中的哪些微妙的实验者偏见正在接受。
Two other ideas: 另外两个想法:
Make sure you're not training and testing on the same data. 确保您没有对相同的数据进行培训和测试。 This sounds kind of dumb, but in computer vision applications you should take care that: make sure you're not repeating data (say two frames of the same video fall on different folds), you're not training and testing on the same individual, etc. It is more subtle than it sounds.
这听起来有点愚蠢,但在计算机视觉应用中,你应该注意:确保你没有重复数据(比如同一个视频的两个帧落在不同的折叠上),你不是在同一个人的训练和测试等等。它比听起来更微妙。
Make sure you search for gamma and C parameters for the RBF kernel. 确保搜索RBF内核的gamma和C参数。 There are good theoretical (asymptotic) results that justify that a linear classifier is just a degenerate RBF classifier.
有很好的理论(渐近)结果证明线性分类器只是简并RBF分类器。 So you should just look for a good (C, gamma) pair.
所以你应该只寻找一个好的(C,gamma)对。
Notwithstanding that the devil is in the details, here are three simple tests you could try: 尽管魔鬼在细节中,但您可以尝试三种简单的测试:
classregtree
, or you can load into R and use rpart
. classregtree
在Matlab中classregtree
,或者您可以加载到R并使用rpart
。 This could tell you if one or just a few features happen to give a perfect separation. Method #1 is fast & should be insightful. 方法#1很快,应该很有见地。 There are some other methods I could recommend, but #1 and #2 are easy and it would be odd if they don't give any insights.
我可以推荐一些其他方法,但#1和#2很容易,如果他们不提供任何见解会很奇怪。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.