简体   繁体   English

Sklearn SVM和Matlab SVM

[英]Sklearn SVM vs Matlab SVM

Problem: I need to train a classifier (in matlab) to classify multiple levels of signal noise. 问题:我需要训练一个分类器(在Matlab中),以对多个级别的信号噪声进行分类。

So i trained a multi class SVM in matlab using the fitcecoc and obtained an accuracy of 92%. 因此,我使用fitcecoc在matlab中训练了多类SVM,并获得了92%的准确度。

Then i trained a multiclass SVM using sklearn.svm.svc in python, but it seems that however i fiddle with the parameters, i cannot achieve more than 69% accuracy. 然后,我在python中使用sklearn.svm.svc训练了多类SVM,但是似乎我在弄弄参数却无法达到69%以上的精度。

30% of the data was held back and used to verify the training. 保留了30%的数据,并用于验证训练。 the confusion matrixes can be seen below. 混淆矩阵如下所示。

Matlab confusion matrix Matlab混淆矩阵

Python confusion matrix Python混淆矩阵

So if anyone has some experience or suggestions with svm.svc multiclass training and can see a problem in my code, or has a suggestion it would be greatly appreciated. 因此,如果有人对svm.svc多类培训有一些经验或建议,并且可以在我的代码中看到问题,或者有建议,将不胜感激。

Python code: Python代码:

import numpy as np
from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
#from sklearn import preprocessing

#### SET fitting parameters here
C = 100
gamma = 1e-8

#### SET WEIGHTS HERE
C0_Weight = 1*C
C1_weight = 1*C
C2_weight = 1*C
C3_weight = 1*C
C4_weight = 1*C
#####


X = np.genfromtxt('data/features.csv', delimiter=',')
Y = np.genfromtxt('data/targets.csv', delimiter=',')

print 'feature data is of size: ' + str(X.shape)
print 'target data is of size: ' + str(Y.shape)

# SPLIT X AND Y INTO TRAINING AND TEST SET
test_size = 0.3
X_train, x_test, Y_train, y_test = train_test_split(X, Y,         
... test_size=test_size, random_state=0)

svc = svm.SVC(C=C,kernel='rbf', gamma=gamma, class_weight = {0:C0_Weight, 
... 1:C1_weight, 2:C2_weight, 3:C3_weight, 4:C4_weight},cache_size = 1000)

svc.fit(X_train, Y_train)
scores = cross_val_score(svc, X_train, Y_train, cv=10)
print scores
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

Out = svc.predict(x_test)

np.savetxt("data/testPredictions.csv", Out, delimiter=",")
np.savetxt("data/testTargets.csv", y_test, delimiter=",")

# calculate accuracy in test data
Hits = 0
HitsOverlap = 0
for idx, val in enumerate(Out):
    Hits += int(y_test[idx]==Out[idx])
    HitsOverlap += int(y_test[idx]==Out[idx]) + int(y_test[idx]==
    ... (Out[idx]-1)) + int(y_test[idx]==(Out[idx]+1))

print "Accuracy in testset: ", Hits*100/(11595*test_size)
print "Accuracy in testset w. overlap: ", HitsOverlap*100/(11595*test_size)

to those curious how i got the parameters, they were found with GridSearchCV (and increased the accuracy from 40% to 69) 对于那些好奇我如何获得参数的人,它们是通过GridSearchCV找到的(并将精度从40%提高到69)

Any help or suggestions is greatly appreciated. 任何帮助或建议,我们将不胜感激。

After much pulling my hair, the answer was found here: http://neerajkumar.org/writings/svm/ 经过一番拉动,在这里找到了答案: http : //neerajkumar.org/writings/svm/

when the inputs were scaled with StandardScaler(), svm.svc now produces superior results to matlab!! 当使用StandardScaler()缩放输入时,svm.svc现在可产生优于matlab的结果!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM