简体   繁体   中英

Extracting the Classification of a SVM model in python

I'm using SVM for the first time in python. I have also used 5 cross validation to check the accuracy of the model.

The objective of the model is to classify whether the output is a defect or not. I would like to cross check the output classification against the original dataset. In otherwords, I would like to understand which products have been classified as a defect and which specific haven't been classified as a defect. How do I go about it?

My code:

from sklearn.svm import SVC  
svclassifier_rbf = SVC(kernel='rbf')  
clf = svclassifier_rbf.fit(X_train, y_train)  

from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test,y_pred_A_rbf))  
print(classification_report(y_test,y_pred_A_rbf)

)

Thank you, Nimish

I do not see the cross validation part of the code. Assuming you have done it, and it looks something like this:

from sklearn.model_selection import KFold
from sklearn.svm import SVC 

kf = KFold(n_splits = 5, shuffle = True)
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train = y_true[train_index]
    svclassifier_rbf = SVC(kernel='rbf') 
    svclassifier_rbf.fit(X_train, y_train)
    ###### NEXT LINE NEEDED #######
    y_pred_A_rbf[test_index] = svclassifier_rbf.predict(X_test)

What is missing in your code is svclassifier_rbf.predict(X_test)

This is used to predict your classes. You can now take the values of the variable y_pred_A_rbf and pass it to a confusion matrix to read your True Positives, True Negatives, False Positives and False Negatives. A typical confusion matrix in Python can be mapped to the following picture below:

Python中的混淆矩阵

Now that you have your two arrays of actual labels and predicted labels, you can do something like if actual label and predicted label is true, or in other words 1, then they are true positives and are correctly classified , similarly, you can do the following for true negatives, false positives, and false negatives to study which records have been predicted and classified correctly or incorrectly.

For example, if you want to know which records have been correctly classified as the positive class (in this case may be let's assume defect), you can do:

tp = np.where((y_true == 1) & (y_pred == 1), 'True Positive', 'Else')

You will now get the indexes of all the records that have been classified properly as the positive class.

If you are working on classification problems just to test the model accuracy and behavior use

from sklearn.metrics import accuracy_score
accuracy_score(y_test,clf.predict(your_X_test))

Refer my git link for document classification I've used Naive bayes on top of tfidf/count vectorizer features.

Document classification using MultinomialNB

Hope this help you in document classification

you can get the records which are predicted as defects using the following code. I am assuming that X_test is your test-input data.

print(X_test[y_pred_A_rbf==1])

You have many methods to test how accurate your y_pred is. Basically, you need to match the y_pred and y_test. If you are new to this field and facing issues to interpret the confusion matrixes and reports, you can simply print your y_pred in a CSV file and compare it with the y_test. That would provide you with the actual situation of the scenario.

np.savetxt("filename.csv",y_pred,delimiter=",",fmt="%.5f")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM