简体繁体 English

SVM 的 C++ 数据

[英]C++ data for SVM

原文 2014-04-08 13:49:32 8 1 c++/ opencv/ bigdata/ svm

I'll use openCV's (C++) SVM(Support Vector Machines) for classification.我将使用 openCV 的 (C++) SVM（支持向量机）进行分类。 But have a problem:但是有个问题：

Feature vectors are so big (each has 1890000 elements) and I have more than 10000 feature vectors to train SVM.特征向量太大了（每个有 1890000 个元素），我有超过 10000 个特征向量来训练 SVM。 How can I manipulate feature vectors or use them without experience memory problems?如何操作特征向量或使用它们而不会遇到内存问题？

1 个解决方案

With such high dimensions and with that many training samples you will require a lot of memory to use any popular implementation of SVM.对于如此高的维度和如此多的训练样本，您将需要大量内存才能使用任何流行的 SVM 实现。 If I were to face this problem then I would consider at least one of these options:如果我要面对这个问题，那么我至少会考虑以下选项之一：

Reduce the dimension of each vector, there are plenty of algorithms to do this but PCA is a good start.减少每个向量的维度，有很多算法可以做到这一点，但 PCA 是一个好的开始。
Get computing time in some host with a lot of memory (maybe one of amazon ec2 instances would be suffice)在一些具有大量内存的主机上获取计算时间（也许亚马逊 ec2 实例之一就足够了）
Test with a linear online approximation of SVM.使用 SVM 的线性在线近似进行测试。 In high dimensions, it is very likely that you can separate the classes linearly and there are SVM online approximations that you could use and then load to memory just one sample at a time in which case you don't need as much memory (I would consider pegasos-svm for this).在高维中，您很可能可以线性地分离类，并且您可以使用 SVM在线近似值，然后一次只加载一个样本到内存中，在这种情况下，您不需要那么多内存（我会为此考虑 pegasos-svm）。