简体   繁体   English

在python中提取SVM模型的分类

[英]Extracting the Classification of a SVM model in python

I'm using SVM for the first time in python. 我第一次在python中使用SVM。 I have also used 5 cross validation to check the accuracy of the model. 我还使用了5个交叉验证来检查模型的准确性。

The objective of the model is to classify whether the output is a defect or not. 该模型的目的是对输出是否为缺陷进行分类。 I would like to cross check the output classification against the original dataset. 我想对照原始数据集交叉检查输出分类。 In otherwords, I would like to understand which products have been classified as a defect and which specific haven't been classified as a defect. 换句话说,我想了解哪些产品被归类为缺陷,哪些特定产品未被归类为缺陷。 How do I go about it? 我该怎么办?

My code: 我的代码:

from sklearn.svm import SVC  
svclassifier_rbf = SVC(kernel='rbf')  
clf = svclassifier_rbf.fit(X_train, y_train)  

from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test,y_pred_A_rbf))  
print(classification_report(y_test,y_pred_A_rbf)

)

Thank you, Nimish 谢谢你,尼米什

I do not see the cross validation part of the code. 我没有看到代码的交叉验证部分。 Assuming you have done it, and it looks something like this: 假设您已经完成了,它看起来像这样:

from sklearn.model_selection import KFold
from sklearn.svm import SVC 

kf = KFold(n_splits = 5, shuffle = True)
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train = y_true[train_index]
    svclassifier_rbf = SVC(kernel='rbf') 
    svclassifier_rbf.fit(X_train, y_train)
    ###### NEXT LINE NEEDED #######
    y_pred_A_rbf[test_index] = svclassifier_rbf.predict(X_test)

What is missing in your code is svclassifier_rbf.predict(X_test) 您的代码中缺少的是svclassifier_rbf.predict(X_test)

This is used to predict your classes. 这用于预测您的课程。 You can now take the values of the variable y_pred_A_rbf and pass it to a confusion matrix to read your True Positives, True Negatives, False Positives and False Negatives. 现在,您可以获取变量y_pred_A_rbf的值,并将其传递给混淆矩阵,以读取“真肯定”,“真否定”,“假肯定”和“假否定”。 A typical confusion matrix in Python can be mapped to the following picture below: Python中的典型混淆矩阵可以映射到下图:

Python中的混淆矩阵

Now that you have your two arrays of actual labels and predicted labels, you can do something like if actual label and predicted label is true, or in other words 1, then they are true positives and are correctly classified , similarly, you can do the following for true negatives, false positives, and false negatives to study which records have been predicted and classified correctly or incorrectly. 既然您有了两个实际标签和预测标签数组,则可以执行以下操作: 如果实际标签和预测标签为true,或者换句话说为1,则它们是真实的正数并且已正确分类 ,类似地,您可以接下来是真阴性,假阳性和假阴性,以研究哪些记录已正确预测或错误分类。

For example, if you want to know which records have been correctly classified as the positive class (in this case may be let's assume defect), you can do: 例如,如果您想知道哪些记录已正确分类为肯定类(在这种情况下,我们可以假设是有缺陷的),则可以执行以下操作:

tp = np.where((y_true == 1) & (y_pred == 1), 'True Positive', 'Else')

You will now get the indexes of all the records that have been classified properly as the positive class. 现在,您将获得已正确分类为肯定类的所有记录的索引。

If you are working on classification problems just to test the model accuracy and behavior use 如果您正在处理分类问题,只是为了测试模型的准确性和行为,

from sklearn.metrics import accuracy_score
accuracy_score(y_test,clf.predict(your_X_test))

Refer my git link for document classification I've used Naive bayes on top of tfidf/count vectorizer features. 请参阅我的git链接以获取文档分类,我在tfidf / count矢量化器功能之上使用了朴素贝叶斯。

Document classification using MultinomialNB 使用MultinomialNB进行文档分类

Hope this help you in document classification 希望这对您的文档分类有帮助

you can get the records which are predicted as defects using the following code. 您可以使用以下代码获取预测为缺陷的记录。 I am assuming that X_test is your test-input data. 我假设X_test是您的测试输入数据。

print(X_test[y_pred_A_rbf==1])

You have many methods to test how accurate your y_pred is. 您有很多方法可以测试y_pred的准确性。 Basically, you need to match the y_pred and y_test. 基本上,您需要匹配y_pred和y_test。 If you are new to this field and facing issues to interpret the confusion matrixes and reports, you can simply print your y_pred in a CSV file and compare it with the y_test. 如果您是该领域的新手,并且在解释混淆矩阵和报告时遇到问题,则只需在CSV文件中打印y_pred并将其与y_test进行比较即可。 That would provide you with the actual situation of the scenario. 这将为您提供方案的实际情况。

np.savetxt("filename.csv",y_pred,delimiter=",",fmt="%.5f")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM