進行多標簽分類時，准確性和F1分數相同

Question

我已經基於該站點編寫了代碼，並進行了不同的多標簽分類。

我想根據每個班級的准確性和每個班級的F1測量來評估我的模型。

問題是我在所有模型中獲得的精度和f1測量值都相同。

我懷疑自己做錯了什么。 我想知道在什么情況下會發生這種情況。

該代碼與該站點完全相同，我這樣計算出f1度量值：

print('Logistic Test accuracy is {} '.format(accuracy_score(test[category], prediction)))
    print 'Logistic f1 measurement is {} '.format(f1_score(test[category], prediction, average='micro'))

更新1

這是整個代碼，

df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']

train,test = train_test_split(df,random_state=42,test_size=0.3,shuffle=True)
X_train = train.sentences
X_test = test.sentences

NB_pipeline = Pipeline([('tfidf', TfidfVectorizer(stop_words=stop_words)),
                        ('clf',OneVsRestClassifier(MultinomialNB(fit_prior=True,class_prior=None))),])
for category in categories:
    print 'processing {} '.format(category)
    NB_pipeline.fit(X_train,train[category])
    prediction = NB_pipeline.predict(X_test)
    print 'NB test accuracy is {} '.format(accuracy_score(test[category],prediction))
    print 'NB f1 measurement is {} '.format(f1_score(test[category],prediction,average='micro'))
    print "\n"

這是輸出：

processing ADR 
NB test accuracy is 0.821963394343 
NB f1 measurement is 0.821963394343

這就是我的數據的樣子：

,sentences,ADR,WD,EF,INF,SSI,DI,others
0,"extreme weight gain, short-term memory loss, hair loss.",1,0,0,0,0,0,0
1,I am detoxing from Lexapro now.,0,0,0,0,0,0,1
2,I slowly cut my dosage over several months and took vitamin supplements to help.,0,0,0,0,0,0,1
3,I am now 10 days completely off and OMG is it rough.,0,0,0,0,0,0,1
4,"I have flu-like symptoms, dizziness, major mood swings, lots of anxiety, tiredness.",0,1,0,0,0,0,0
5,I have no idea when this will end.,1,0,0,0,0,0,1

為什么我得到相同的號碼？

謝謝。

Answer 1

通過做這個：

for category in categories:
...
...

您實際上是在將問題從多標簽轉換為二進制。 如果要繼續進行此操作，則不需要OneVsRestClassifier 。 您可以直接使用MultinomialNB 。 否則，您可以使用OneVsRestClassifier直接執行此OneVsRestClassifier ：

# Send all labels at once.
NB_pipeline.fit(X_train,train[categories])
prediction = NB_pipeline.predict(X_test)
print 'NB test accuracy is {} '.format(accuracy_score(test[categories],prediction))
print 'NB f1 measurement is {} '.format(f1_score(test[categories],prediction, average='micro'))

它可能會對所有訓練數據中存在的某些標簽發出警告，但這是因為您發布的樣本數據太小。

@ user2906838，您對分數是正確的。 當average='micro' ，產生的結果將相等。 這在文檔中提到：

請注意，對於包含所有標簽的多類設置中的“微”平均，將產生相同的精度，查全率和F，

它在那兒寫的是關於多類的，但是我對二進制也一樣。 看到這個類似的問題，用戶已在其中手動計算了所有分數：多類別分類（多重分類）：微觀平均准確度，精確度，召回率和F分數均相等

Answer 2

嗯，這可能是因為這兩個accuracy_score和f1_score正在返回相同的分數。 盡管它們的計算方式之間存在差異，但結果卻有所不同。 如果您想進一步了解它們是如何計算的，這里已經有了答案：如何使用scikit learning為多類案例計算精確度，查全率，准確度和f1-得分？

關於你提到的電流相同比分問題，請更改其值的average ，從micro到weighted 。 這實際上會改變您的分數。 正如我在評論中指出的那樣。

進行多標簽分類時，准確性和F1分數相同

問題描述

2 個解決方案

解決方案1
4 2018-08-13 05:37:40

解決方案2
1 已采納 2018-08-13 05:40:22

進行多標簽分類時，准確性和F1分數相同

問題描述

2 個解決方案

解決方案1 4 2018-08-13 05:37:40

解決方案2 1 已采納 2018-08-13 05:40:22

解決方案1
4 2018-08-13 05:37:40

解決方案2
1 已采納 2018-08-13 05:40:22