簡體   English   中英

為什么我的accuracy_score 指標不正確? scikit 學習

[英]Why is my accuracy_score metric incorrect? scikit learn

我有一些工作代碼,這給我帶來了麻煩。 我似乎得到了一個幾乎隨機的accuracy_score 指標,而我的預測值的打印輸出表明並非如此。 我正在在線學習教程,這是我到目前為止所寫的內容:

import os
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix

adult_train = pd.read_csv(os.path.expanduser("~/Desktop/") + "adult_train_srt.csv", sep=',')
print(adult_train.head(100))

le = LabelEncoder()
adult_train['age'] = le.fit_transform(adult_train['age'])
adult_train['workclass'] = le.fit_transform(adult_train['workclass'].astype(str))
adult_train['education'] = le.fit_transform(adult_train['education'].astype(str))
adult_train['occupation'] = le.fit_transform(adult_train['occupation'].astype(str))
adult_train['race'] = le.fit_transform(adult_train['race'].astype(str))
adult_train['sex'] = le.fit_transform(adult_train['sex'].astype(str))
adult_train['hours_per_week'] = le.fit_transform(adult_train['hours_per_week'])
adult_train['native_country'] = le.fit_transform(adult_train['native_country'].astype(str))
adult_train['classs'] = le.fit_transform(adult_train['classs'].astype(str))

cols = [col for col in adult_train.columns if col not in ['classs']]
data = adult_train[cols]
target = adult_train['classs']

data_train, data_test, target_train, target_test = train_test_split(data, target, test_size = 0.1) #, random_state = 42)

gnb = GaussianNB()
pred = gnb.fit(data_train, target_train).predict(data_test)
pred_gnb = gnb.predict(data_test)
print(pred_gnb)

print("Naive-Bayes accuracy: (TN + TP / ALL) ", accuracy_score(pred_gnb, target_test)) #normalize = True
print("""Confusion matrix:
TN - FP
FN - TP
Guessed:
0s +, 1s -
0s -, 1s +
""")
print(confusion_matrix(target_test, pred_gnb))

Prediction = pd.DataFrame({'Prediction':pred_gnb})

result = pd.concat([adult_train, Prediction], axis=1)
print(result.head(10))

我不知所措,我無法理解我的數據幀連接是否有效,或者accuracy_score 指標是否正在解決其他問題,因為我得到如下輸出: 在此處輸入圖片說明

這個特殊的例子是說有 7 個真陰性(OK)、1 個假陽性(???)、2 個假陰性(OK)和 0 個真陽性(???,但有一個猜對了?)。 [classs] 列是 [Prediction] 列正在猜測的內容。

result = pd.concat([adult_train, Prediction], axis=1)

這里的 Prediction 數據框,不應該和 Adult_train 連接,Prediction 是對測試集 data_set 的預測結果

pred_gnb = gnb.predict(data_test)

所以,我認為你應該連接 data_test、target_test 和 Prediction,試試這個,它可能會工作

result = pd.concat([pd.DataFrame(data_test), pd.DataFrame(target_test), Prediction], axis=1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM