简体   繁体   English

为什么我的accuracy_score 指标不正确? scikit 学习

[英]Why is my accuracy_score metric incorrect? scikit learn

I have somewhat working code, which is giving me trouble.我有一些工作代码,这给我带来了麻烦。 I seem to get an almost random accuracy_score metric, whereas my printout of predicted values suggests otherwise.我似乎得到了一个几乎随机的accuracy_score 指标,而我的预测值的打印输出表明并非如此。 I was following this tutorial online and here is what I have written so far:我正在在线学习教程,这是我到目前为止所写的内容:

import os
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix

adult_train = pd.read_csv(os.path.expanduser("~/Desktop/") + "adult_train_srt.csv", sep=',')
print(adult_train.head(100))

le = LabelEncoder()
adult_train['age'] = le.fit_transform(adult_train['age'])
adult_train['workclass'] = le.fit_transform(adult_train['workclass'].astype(str))
adult_train['education'] = le.fit_transform(adult_train['education'].astype(str))
adult_train['occupation'] = le.fit_transform(adult_train['occupation'].astype(str))
adult_train['race'] = le.fit_transform(adult_train['race'].astype(str))
adult_train['sex'] = le.fit_transform(adult_train['sex'].astype(str))
adult_train['hours_per_week'] = le.fit_transform(adult_train['hours_per_week'])
adult_train['native_country'] = le.fit_transform(adult_train['native_country'].astype(str))
adult_train['classs'] = le.fit_transform(adult_train['classs'].astype(str))

cols = [col for col in adult_train.columns if col not in ['classs']]
data = adult_train[cols]
target = adult_train['classs']

data_train, data_test, target_train, target_test = train_test_split(data, target, test_size = 0.1) #, random_state = 42)

gnb = GaussianNB()
pred = gnb.fit(data_train, target_train).predict(data_test)
pred_gnb = gnb.predict(data_test)
print(pred_gnb)

print("Naive-Bayes accuracy: (TN + TP / ALL) ", accuracy_score(pred_gnb, target_test)) #normalize = True
print("""Confusion matrix:
TN - FP
FN - TP
Guessed:
0s +, 1s -
0s -, 1s +
""")
print(confusion_matrix(target_test, pred_gnb))

Prediction = pd.DataFrame({'Prediction':pred_gnb})

result = pd.concat([adult_train, Prediction], axis=1)
print(result.head(10))

I am at a loss, I have no way of understanding whether or not my dataframe concatenation is working or if the accuracy_score metric is solving something else, because I get outputs like so:我不知所措,我无法理解我的数据帧连接是否有效,或者accuracy_score 指标是否正在解决其他问题,因为我得到如下输出: 在此处输入图片说明

This particular instance it is saying there are 7 true negatives (OK), 1 false positive (???), 2 false negatives (OK), and 0 true positives (???, but there was one guessed correct?).这个特殊的例子是说有 7 个真阴性(OK)、1 个假阳性(???)、2 个假阴性(OK)和 0 个真阳性(???,但有一个猜对了?)。 The [classs] column is what the [Prediction] columnn is guessing. [classs] 列是 [Prediction] 列正在猜测的内容。

result = pd.concat([adult_train, Prediction], axis=1)

Here the Prediction dataframe, should not be concatenated with adult_train, Prediction is the result of prediction on the test set data_set这里的 Prediction 数据框,不应该和 Adult_train 连接,Prediction 是对测试集 data_set 的预测结果

pred_gnb = gnb.predict(data_test)

So, I think you should concatenate the data_test, the target_test and the Prediction, try this and it may work所以,我认为你应该连接 data_test、target_test 和 Prediction,试试这个,它可能会工作

result = pd.concat([pd.DataFrame(data_test), pd.DataFrame(target_test), Prediction], axis=1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用scikit理解accuracy_score - 用我自己的语料库学习? - Understanding accuracy_score with scikit-learn with my own corpus? scikit学习中的precision_score与Keras中的准确性之间的差异 - Difference between accuracy_score in scikit-learn and accuracy in Keras accuracy_score(来自 Scikit-learn)是计算总体准确度还是平均准确度? - Do accuracy_score (from Scikit-learn) compute overall accuracy or mean accuracy? Keras评估_生成器准确率和scikit学习accuracy_score不一致 - Keras evaluate_generator accuracy and scikit learn accuracy_score inconsistent 拟合模型上的评分方法与scikit-learn的precision_score有什么区别? - What's the difference between the score method on a fitted model, vs accuracy_score from scikit-learn? “标量变量的无效索引”-使用Scikit时学习“ accuracy_score” - “Invalid Index to Scalar Variable” - When Using Scikit Learn “accuracy_score” Scikit学习返回错误的分类报告和准确性得分 - Scikit-learn returning incorrect classification report and accuracy score scikit multilearn:accuracy_score ValueError:不支持多类多输出 - scikit multilearn: accuracy_score ValueError: multiclass-multioutput is not supported Accuracy_score 出现错误 - Accuracy_score appears wrong 在使用 scikit-learn 在管道中安装 ML model 后,如何将精度与 score() function 中的另一个性能指标交换? - How do I swap accuracy with another performance metric from the score() function after fitting the ML model in Pipeline with scikit-learn?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM