简体   繁体   English

使用python绑定的示例,用于SVM库LIBSVM

[英]An example using python bindings for SVM library, LIBSVM

I am in dire need of a classification task example using LibSVM in python. 我迫切需要在python中使用LibSVM的分类任务示例。 I don't know how the Input should look like and which function is responsible for training and which one for testing Thanks 我不知道输入应该是什么样的,哪个功能负责培训,哪个功能用于测试谢谢

The code examples listed here don't work with LibSVM 3.1, so I've more or less ported the example by mossplix : 这里列出的代码示例不适用于LibSVM 3.1,所以我或多或少地通过mossplix移植了这个示例

from svmutil import *
svm_model.predict = lambda self, x: svm_predict([0], [x], self)[0][0]

prob = svm_problem([1,-1], [[1,0,1], [-1,0,-1]])

param = svm_parameter()
param.kernel_type = LINEAR
param.C = 10

m=svm_train(prob, param)

m.predict([1,1,1])

This example demonstrates a one-class SVM classifier ; 这个例子演示了一个单类SVM分类器 ; it's about as simple as possible while still showing the complete LIBSVM workflow. 它仍然显示完整的LIBSVM工作流程,尽可能简单。

Step 1 : Import NumPy & LIBSVM 第1步 :导入NumPy和LIBSVM

  import numpy as NP
    from svm import *

Step 2: Generate synthetic data: for this example, 500 points within a given boundary (note: quite a few real data sets are are provided on the LIBSVM website ) 第2步:生成合成数据:对于此示例,给定边界内的500个点(注意:LIBSVM 网站上提供了相当多的实际数据集)

Data = NP.random.randint(-5, 5, 1000).reshape(500, 2)

Step 3: Now, choose some non-linear decision boundary for a one-class classifier: 第3步:现在,为一类分类器选择一些非线性决策边界:

rx = [ (x**2 + y**2) < 9 and 1 or 0 for (x, y) in Data ]

Step 4: Next, arbitrarily partition the data w/r/t this decision boundary: 步骤4:接下来,任意划分数据w / r / t这个决策边界:

  • Class I : those that lie on or within an arbitrary circle 第I类那些位于或任意的

  • Class II : all points outside the decision boundary (circle) 第二类 :决策边界以外的所有点(圆圈)


The SVM Model Building begins here; SVM模型构建从这里开始; all steps before this one were just to prepare some synthetic data. 在此之前的所有步骤只是为了准备一些合成数据。

Step 5 : Construct the problem description by calling svm_problem , passing in the decision boundary function and the data , then bind this result to a variable. 步骤5 :通过调用svm_problem构建问题描述 ,传入决策边界函数数据 ,然后将此结果绑定到变量。

px = svm_problem(rx, Data)

Step 6: Select a kernel function for the non-linear mapping 步骤6:为非线性映射选择内核函数

For this exmaple, i chose RBF (radial basis function) as my kernel function 对于这个例子,我选择了RBF (径向基函数)作为我的核函数

pm = svm_parameter(kernel_type=RBF)

Step 7: Train the classifier, by calling svm_model , passing in the problem description (px) & kernel (pm) 步骤7:通过调用svm_model训练分类 ,传入问题描述 (px)和内核 (pm)

v = svm_model(px, pm)

Step 8: Finally, test the trained classifier by calling predict on the trained model object ('v') 步骤8:最后,通过在训练的模型对象('v')上调用预测来测试训练的分类器

v.predict([3, 1])
# returns the class label (either '1' or '0')

For the example above, I used version 3.0 of LIBSVM (the current stable release at the time this answer was posted). 对于上面的示例,我使用了3.0版本的LIBSVM此答案发布时的当前稳定版本)。

Finally, w/r/t the part of your question regarding the choice of kernel function , Support Vector Machines are not specific to a particular kernel function--eg, i could have chosen a different kernel (gaussian, polynomial, etc.). 最后,w / r / t关于内核函数选择的问题部分,支持向量机并不特定于特定的内核函数 - 例如,我可以选择不同的内核(高斯,多项式等)。

LIBSVM includes all of the most commonly used kernel functions--which is a big help because you can see all plausible alternatives and to select one for use in your model, is just a matter of calling svm_parameter and passing in a value for kernel_type (a three-letter abbreviation for the chosen kernel). LIBSVM包括所有最常用的内核函数-这是一个很大的帮助,因为你可以看到所有可行的替代品,并选择一个为模型中使用,只需要调用svm_parameter和值传递的kernel_type的事项(一所选内核的三字母缩写)。

Finally, the kernel function you choose for training must match the kernel function used against the testing data. 最后,您选择用于训练的内核函数必须与用于测试数据的内核函数相匹配。

LIBSVM reads the data from a tuple containing two lists. LIBSVM从包含两个列表的元组中读取数据。 The first list contains the classes and the second list contains the input data. 第一个列表包含类,第二个列表包含输入数据。 create simple dataset with two possible classes you also need to specify which kernel you want to use by creating svm_parameter. 创建具有两个可能类的简单数据集,您还需要通过创建svm_parameter来指定要使用的内核。


>> from libsvm import *
>> prob = svm_problem([1,-1],[[1,0,1],[-1,0,-1]])
>> param = svm_parameter(kernel_type = LINEAR, C = 10)
  ## training  the model
>> m = svm_model(prob, param)
#testing the model
>> m.predict([1, 1, 1])

You might consider using 你可以考虑使用

http://scikit-learn.sourceforge.net/ http://scikit-learn.sourceforge.net/

That has a great python binding of libsvm and should be easy to install 这有一个很好的libsvm python绑定,应该很容易安装

Adding to @shinNoNoir : 添加到@shinNoNoir:

param.kernel_type represents the type of kernel function you want to use, 0: Linear 1: polynomial 2: RBF 3: Sigmoid param.kernel_type表示要使用的内核函数的类型,0:线性1:多项式2:RBF 3:Sigmoid

Also have in mind that, svm_problem(y,x) : here y is the class labels and x is the class instances and x and y can only be lists,tuples and dictionaries.(no numpy array) 还要记住,svm_problem(y,x):这里y是类标签,x是类实例,x和y只能是列表,元组和字典。(没有numpy数组)

param = svm_parameter('-s 0 -t 2 -d 3 -c '+str(C)+' -g '+str(G)+' -p '+str(self.epsilon)+' -n '+str(self.nu))

I don't know about the earlier versions but in LibSVM 3.xx the method svm_parameter('options') will takes just one argument . 我不知道早期版本,但在LibSVM 3.xx中 ,方法svm_parameter('options')只需要一个参数

In my case C , G , p and nu are the dynamic values. 在我的例子中, CGpnu是动态值。 You make changes according to your code. 您可以根据代码进行更改。


options: 选项:

    -s svm_type : set type of SVM (default 0)
        0 -- C-SVC      (multi-class classification)
        1 -- nu-SVC     (multi-class classification)
        2 -- one-class SVM
        3 -- epsilon-SVR    (regression)
        4 -- nu-SVR     (regression)
    -t kernel_type : set type of kernel function (default 2)
        0 -- linear: u'*v
        1 -- polynomial: (gamma*u'*v + coef0)^degree
        2 -- radial basis function: exp(-gamma*|u-v|^2)
        3 -- sigmoid: tanh(gamma*u'*v + coef0)
        4 -- precomputed kernel (kernel values in training_set_file)
    -d degree : set degree in kernel function (default 3)
    -g gamma : set gamma in kernel function (default 1/num_features)
    -r coef0 : set coef0 in kernel function (default 0)
    -c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
    -n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
    -p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
    -m cachesize : set cache memory size in MB (default 100)
    -e epsilon : set tolerance of termination criterion (default 0.001)
    -h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)
    -b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
    -wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)
    -v n: n-fold cross validation mode
    -q : quiet mode (no outputs)

Source of documentation: https://www.csie.ntu.edu.tw/~cjlin/libsvm/ 文档来源: https//www.csie.ntu.edu.tw/~cjlin/libsvm/

SVM via SciKit-learn: 通过SciKit学习SVM:

from sklearn.svm import SVC
X = [[0, 0], [1, 1]]
y = [0, 1]
model = SVC().fit(X, y)

tests = [[0.,0.], [0.49,0.49], [0.5,0.5], [2., 2.]]
print(model.predict(tests))
# prints [0 0 1 1]

For more details here: http://scikit-learn.org/stable/modules/svm.html#svm 有关详细信息,请访问: http//scikit-learn.org/stable/modules/svm.html#svm

Here is a dummy example I mashed up: 这是一个虚构的例子,我把它搞砸了:

import numpy
import matplotlib.pyplot as plt
from random import seed
from random import randrange

import svmutil as svm

seed(1)

# Creating Data (Dense)
train = list([randrange(-10, 11), randrange(-10, 11)] for i in range(10))
labels = [-1, -1, -1, 1, 1, -1, 1, 1, 1, 1]
options = '-t 0'  # linear model
# Training Model
model = svm.svm_train(labels, train, options)


# Line Parameters
w = numpy.matmul(numpy.array(train)[numpy.array(model.get_sv_indices()) - 1].T, model.get_sv_coef())
b = -model.rho.contents.value
if model.get_labels()[1] == -1:  # No idea here but it should be done :|
    w = -w
    b = -b

print(w)
print(b)

# Plotting
plt.figure(figsize=(6, 6))
for i in model.get_sv_indices():
    plt.scatter(train[i - 1][0], train[i - 1][1], color='red', s=80)
train = numpy.array(train).T
plt.scatter(train[0], train[1], c=labels)
plt.plot([-5, 5], [-(-5 * w[0] + b) / w[1], -(5 * w[0] + b) / w[1]])
plt.xlim([-13, 13])
plt.ylim([-13, 13])
plt.show()

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM