简体繁体 English

如何将numpy数组转换为libsvm格式

[英]How to convert numpy array into libsvm format

原文 2014-10-22 14:02:12 0 1 python/ arrays/ numpy/ svm/ libsvm

I have a numpy array for an image and am trying to dump it into the libsvm format of LABEL I0:V0 I1:V1 I2:V2..IN:VN . 我有一个图像的numpy数组，并试图将其转储为LABEL I0:V0 I1:V1 I2:V2..IN:VN的libsvm格式。 I see that scikit-learn has a dump_svmlight_file and would like to use that if possible since it's optimized and stable. 我看到scikit-learn有一个dump_svmlight_file并希望尽可能使用它，因为它已经过优化和稳定。

It takes parameters of X, y, and file output name. 它采用X，y和文件输出名称的参数。 The values I'm thinking about would be: X - numpy array y - ???? 我正在考虑的值将是：X-numpy数组y-???? file output name - self-explanatory 文件输出名称-不言自明

Would this be a correct assumption for X? 对于X，这将是正确的假设吗？ I'm very confused about what I should do for y though. 我对自己应该做什么感到很困惑。 It appears it needs to be a feature set of some kind. 看来它必须是某种功能集。 I don't know how I would go about obtaining that however. 我不知道我该如何去获得它。 Thanks in advance for the help! 先谢谢您的帮助！

1 个解决方案

The svmlight format is tailored to classification/regression problems. svmlight格式适合于分类/回归问题。 Therefore, the array X is a matrix with as many rows as data points in your set, and as many columns as features. 因此，数组X是一个矩阵，其中行与集合中的数据点一样多，列与要素一样多。 y is the vector of instance labels. y是实例标签的向量。

For example, suppose you have 1000 objects (images of bicycles and bananas, for example), featurized in 400 dimensions. 例如，假设您有1000个对象（例如，自行车和香蕉的图像）以400个维度进行了特征化。 X would be 1000x400, and y would be a 1000-vector with a 1 entry where there should be a bicycle, and a -1 entry where there should be a banana. X将是1000x400，而y将是一个1000向量，其中应该有一辆自行车的入口为1，而应该有香蕉的入口为-1。