如何提高python中支持向量机的准确性？

Question

I've been trying to fit some data and predict them.I'm using SVC function in sklearn to train them.My problem is that my data are so complicated and I don't know how to classify them.I'm Uploading a 3d figure here .The dataset includes about 800 rows with 3 columns.I used gamma=100 and C=10.0 and after splitting the data set and test them i got accuracies between 61.0 and 64.0 percent.but i think i can do better than these.i set kernel 'rbf' and after some tests i understood that 'rbf' is good choice.but after reading the documentation of svm here and the kernel functions here i got confused.here are my questions:1.Which kernel should i use(based on my dataset which is uploaded here)?2.what other parameters should i change for classification task?我一直在尝试拟合一些数据并预测它们。我在 sklearn 中使用 SVC 函数来训练它们。我的问题是我的数据太复杂了，我不知道如何对它们进行分类。我正在上传一个3d 图在这里。数据集包括大约 800 行和 3 列。我使用了 gamma=100 和 C=10.0，在拆分数据集并测试它们之后，我得到了 61.0% 和 64.0% 之间的准确度。但我认为我可以做得比这些更好.我设置内核'rbf'，经过一些测试，我明白'rbf'是不错的选择。但是在阅读了这里的svm文档和这里的内核函数后，我感到困惑。这是我的问题：1.我应该使用哪个内核（基于我在这里上传的数据集）？2.我应该为分类任务更改哪些其他参数？ help me to get good accuracy here is my dataset:帮助我获得良好的准确性，这是我的数据集：

from sklearn import svm
from sklearn.model_selection import train_test_split
model=svm.SVC(C=1.0,gamma=100,kernel='rbf')
X_train, X_test, y_train, y_test = train_test_split(X, labels)
model.fit(X_train,y_train)
print(model.predict(X_test))
print('\n\n\n',y_test,'\n\n\n',

( np.array(y_test)==model.predict(X_test)).sum()/(np.array(y_test).shape))

Answer 1

Just note: You actually did not provide any dataset, just the source code.请注意：您实际上没有提供任何数据集，只是提供了源代码。

Using different kernel seems like a good idea.使用不同的内核似乎是个好主意。 Only from that image it'S really hard to say which kernel will perform better than the others, usually the choice of kernel requires some intuition or domain knowledge, so it's hard to say that offhand.仅从该图像很难说哪个内核会比其他内核表现得更好，通常内核的选择需要一些直觉或领域知识，所以很难说。

Since there are only 4 kernels in scikit-learn, I think you should just try all of them and compare them, maybe using crossvalidation, to see which performs the best.由于 scikit-learn 中只有 4 个内核，我认为您应该尝试所有内核并进行比较，也许使用交叉验证，看看哪个性能最好。 Some of the kernels are parametrized, and there you may try multiple kernels, up to degree 10. Using bigger degree than 10 for polynomial kernel might not help anything, but that's just my guess.一些内核是参数化的，在那里你可以尝试多个内核，最多 10 次。对多项式内核使用大于 10 的次数可能没有任何帮助，但这只是我的猜测。

You also should try different valus for the C parameter.您还应该为 C 参数尝试不同的值。 In most machine learning algorithms, the constants weighting individual losses in multi-task training (which is the case also here), have "multiplicative" impact (for lack of better words), so I advice to use to use following values for C: [1e-3, 1e-2, 1e-1, 1, 10, 100]在大多数机器学习算法中，在多任务训练中加权个体损失的常数（这里也是这种情况）具有“乘法”影响（因为缺少更好的词），因此我建议使用以下值作为 C： [1e-3, 1e-2, 1e-1, 1, 10, 100]

如何提高python中支持向量机的准确性？

问题描述

1 个解决方案

解决方案1
0 2019-03-15 09:30:30

如何提高python中支持向量机的准确性？

问题描述

1 个解决方案

解决方案1 0 2019-03-15 09:30:30

解决方案1
0 2019-03-15 09:30:30