![](/img/trans.png)
[英]How to calculate Precision, Recall and F-score using python?
[英]Error when computing recall and precision and F-score of 4 models using cross validation?
這是我的分類代碼。 我想使用交叉驗證打印我的 4 個模型的准確率、召回率和精度。 我嘗試過但失敗了,因為它總是打印一組數據而不是整體數據。 你知道怎么做嗎?
我想知道是否可以根據我的混淆矩陣比較每個 model 以便打印哪一個無法預測每個集合的正確 label。 所以@Nikaido,我嘗試了你的解決方案,但精度的結果,召回與我手動計算時得到的值不對應。
tfidf_vectorizer = TfidfVectorizer()
tfidf_vectorizer.fit(verbatim_train_remove_stop_words_lemmatize)
X = tfidf_vectorizer.transform(verbatim_train_remove_stop_words_lemmatize)
total_verbatim = X.shape[0]
print(total_verbatim)
labels = np.zeros(total_verbatim);#creation de variable ; consulter les mal étiquettés +bien étiquettés
#error avec configuration avec l'ensemble
labels[1:1315]=0; #motivations
labels[1316:1891]=1;#freins
df = pd.DataFrame(data={
"id": [],
"ground_true": [],
"original_sentence": [],
"pred_model1": []
})
cv_splitter = KFold(n_splits=10, shuffle=False, random_state=None)
model1 = LinearSVC()
model2 = MultinomialNB()
model3 = LogisticRegression() #(random_state=0)
model4 = RandomForestClassifier()
models = [model1, model2, model3, model4]
for model in models:
verbatim_preprocess = np.array(verbatim_train_remove_stop_words_lemmatize)
y_pred = cross_val_predict(model, X, labels, cv=cv_splitter)
temp_df = pd.DataFrame.from_dict(data={"id": X,
"ground_true": labels,
"original_sentence": verbatim_preprocess,
"pred_model1": y_pred,
"pred_model2": y_pred,
"pred_model3": y_pred,
"pred_model4": y_pred
})
df = pd.concat([df, temp_df])
print("Model: {}".format(model))
print("matrice confusion: {}".format(confusion_matrix(labels, y_pred)))
print("Accuracy: {}".format(accuracy_score(labels, y_pred)))
print("Precision: {}".format(precision_score(labels, y_pred)))
print("Recall: {}".format(recall_score(labels, y_pred)))
print("F mesure: {}".format(f1_score(labels, y_pred)))
df.to_excel("EXIT.xlsx")
我得到這個結果
Model: LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=1000,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0)
Accuracy: 0.5393971443680592
Precision: 0.13902439024390245
Recall: 0.09913043478260869
F mesure: 0.11573604060913706
matrice confusion: [[963 353]
[518 57]]
Model: MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
Accuracy: 0.6604970914859862
Precision: 0.014492753623188406
Recall: 0.0017391304347826088
F mesure: 0.0031055900621118015
matrice confusion: [[1248 68]
[ 574 1]]
如果我手動計算第一個 model 的精度:
對於 svm:精度:963/963+353 = 0.73 召回率:963/963+518 = 0,65
如何? 我的代碼錯在某處嗎
Sklearn 為不同模型的 cross_validation 估計提供了很多工具。 該任務可以通過不同的方式完成。 我想到的一個是:
from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, recall_score, accuracy_score
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import KFold
# toy problem
cancer = datasets.load_breast_cancer()
X = cancer.data
y = cancer.target
cv_splitter = KFold(n_splits=10, shuffle=False, random_state=None)
model1 = LinearSVC()
model2 = MultinomialNB()
model3 = LogisticRegression() #(random_state=0)
model4 = RandomForestClassifier()
models = [model1, model2, model3, model4]
for model in models:
y_pred = cross_val_predict(model, X, y, cv=cv_splitter)
print("Accuracy: {}".format(accuracy_score(y, y_pred)))
print("Precision: {}".format(precision_score(y, y_pred)))
print("Recall: {}".format(recall_score(y, y_pred)))
基本上我用過
擁有每個 model 的預測標簽(for 中的 y_pred),然后您可以進行所需的比較。
cross_val_predict
方法的詳細信息在這里
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.