This is my code which does a classification. I would like to print the accuracy, recall, and precision of my 4 models using cross-validation. I tried and failed because it always prints for a set of data and not the overall. Do you have any idea how to do it?
I would like to know if depending on my confusion matrix if it is possible to compare each model so that to print which one fail to predict the right label for each set. so @Nikaido, i tried your solution but the result of the precision, recall does not correspondant to the value i get when i computing them manually.
tfidf_vectorizer = TfidfVectorizer()
tfidf_vectorizer.fit(verbatim_train_remove_stop_words_lemmatize)
X = tfidf_vectorizer.transform(verbatim_train_remove_stop_words_lemmatize)
total_verbatim = X.shape[0]
print(total_verbatim)
labels = np.zeros(total_verbatim);#creation de variable ; consulter les mal étiquettés +bien étiquettés
#error avec configuration avec l'ensemble
labels[1:1315]=0; #motivations
labels[1316:1891]=1;#freins
df = pd.DataFrame(data={
"id": [],
"ground_true": [],
"original_sentence": [],
"pred_model1": []
})
cv_splitter = KFold(n_splits=10, shuffle=False, random_state=None)
model1 = LinearSVC()
model2 = MultinomialNB()
model3 = LogisticRegression() #(random_state=0)
model4 = RandomForestClassifier()
models = [model1, model2, model3, model4]
for model in models:
verbatim_preprocess = np.array(verbatim_train_remove_stop_words_lemmatize)
y_pred = cross_val_predict(model, X, labels, cv=cv_splitter)
temp_df = pd.DataFrame.from_dict(data={"id": X,
"ground_true": labels,
"original_sentence": verbatim_preprocess,
"pred_model1": y_pred,
"pred_model2": y_pred,
"pred_model3": y_pred,
"pred_model4": y_pred
})
df = pd.concat([df, temp_df])
print("Model: {}".format(model))
print("matrice confusion: {}".format(confusion_matrix(labels, y_pred)))
print("Accuracy: {}".format(accuracy_score(labels, y_pred)))
print("Precision: {}".format(precision_score(labels, y_pred)))
print("Recall: {}".format(recall_score(labels, y_pred)))
print("F mesure: {}".format(f1_score(labels, y_pred)))
df.to_excel("EXIT.xlsx")
I get this result
Model: LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=1000,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
Accuracy: 0.5393971443680592
Precision: 0.13902439024390245
Recall: 0.09913043478260869
F mesure: 0.11573604060913706
matrice confusion: [[963 353]
[518 57]]
Model: MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
Accuracy: 0.6604970914859862
Precision: 0.014492753623188406
Recall: 0.0017391304347826088
F mesure: 0.0031055900621118015
matrice confusion: [[1248 68]
[ 574 1]]
if I ccompute manually the precision for the first model:
for svm: Precision: 963/963+353 = 0.73 Recall: 963/963+518 = 0,65
how? is my THE code wrong somewhere
Sklearn offers a lot of tools for the cross_validation estimation on different models. This task can be done in different ways. One I thinked of is:
from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, recall_score, accuracy_score
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import KFold
# toy problem
cancer = datasets.load_breast_cancer()
X = cancer.data
y = cancer.target
cv_splitter = KFold(n_splits=10, shuffle=False, random_state=None)
model1 = LinearSVC()
model2 = MultinomialNB()
model3 = LogisticRegression() #(random_state=0)
model4 = RandomForestClassifier()
models = [model1, model2, model3, model4]
for model in models:
y_pred = cross_val_predict(model, X, y, cv=cv_splitter)
print("Accuracy: {}".format(accuracy_score(y, y_pred)))
print("Precision: {}".format(precision_score(y, y_pred)))
print("Recall: {}".format(recall_score(y, y_pred)))
Basically I used
Having the prediction labels for every model (y_pred in the for) then you can do the comparison that you need.
Details for cross_val_predict
method here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.