简体   繁体   中英

Finding correctly and incorrectly classified data

I want to find the raw data which are classified successfully and which are not classified after Multinomial Nieves Bayes Classification algorithm is applied. For instance I got the accuracy as 88% after applying Multinomail Naives Bayes classification. I want to know the 12% of data which are not classified and also 88% of the data that is classified. Thanks in advance

My data set:

+----------------------+------------+
| Details              | Category   |
+----------------------+------------+
| Any raw text1        | cat1       |
+----------------------+------------+
| any raw text2        | cat1       |
+----------------------+------------+
| any raw text5        | cat2       |
+----------------------+------------+
| any raw text7        | cat1       |
+----------------------+------------+
| any raw text8        | cat2       |
+----------------------+------------+
| Any raw text4        | cat4       |
+----------------------+------------+
| any raw text5        | cat4       |
+----------------------+------------+
| any raw text6        | cat3       |
+----------------------+------------+

My code:

import pandas as pd
import numpy as np
import scipy as sp
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt  
from sklearn.model_selection import train_test_split 
data= pd.read_csv('mydat.xls', delimiter='\t',usecols=
['Details','Category'],encoding='utf-8')
target_one=data['Category']
target_list=data['Category'].unique()         
x_train, x_test, y_train, y_test = train_test_split(data.Details, 
data.Category, random_state=42)
vect = CountVectorizer(ngram_range=(1,2))
#converting traning features into numeric vector
X_train = vect.fit_transform(x_train.values.astype('U'))
#converting training labels into numeric vector
X_test = vect.transform(x_test.values.astype('U'))
# start = time.clock()

mnb = MultinomialNB(alpha =0.13)

mnb.fit(X_train,y_train)

result= mnb.predict(X_test)


# mnb.predict_proba(x_test)[0:10,1]
accuracy_score(result,y_test)

Just iterate over your Data:

for i in range(len(y_test)):
    if result[i] == y_test[i]:
        print("CORRECT: ", X_test[i])
    else
        print("INCORRECT: ", X_test[i])

you can add them to two different lists or only print the id or do whatever you want instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM