简体   繁体   English

在我的 DecisionTree 模型上获得 100% 的准确性

[英]Getting 100% Accuracy on my DecisionTree Model

Here is my code, and it always returns 100% accuracy, regardless of how big the test size is.这是我的代码,无论测试规模有多大,它始终返回 100% 的准确率。 I used the train_test_split method, so I do not believe there should be any duplicates of data.我使用了 train_test_split 方法,所以我认为不应该有任何重复的数据。 Could someone inspect my code?有人可以检查我的代码吗?

from sklearn.tree import DecisionTreeClassifier
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


data = pd.read_csv('housing.csv')

prices = data['median_house_value']
features = data.drop(['median_house_value', 'ocean_proximity'], axis = 1)

prices.shape
(20640,)

features.shape
(20640, 8)


X_train, X_test, y_train, y_test = train_test_split(features, prices, test_size=0.2, random_state=42)

X_train = X_train.dropna()
y_train = y_train.dropna()
X_test = X_test.dropna()
y_test = X_test.dropna()

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

y_train.shape
(16512,)

X_train.shape
(16512, 8)


predictions = model.predict(X_test)
score = model.score(y_test, predictions)
score 

EDIT: I have reworked my answer since I found multiple issues.编辑:自从我发现多个问题以来,我已经重新设计了我的答案。 Please copy-paste the below code to ensure no bugs are left.请复制粘贴以下代码以确保不留下任何错误。

Issues -问题 -

  1. You are using DecisionTreeClassifier instead of DecisionTreeRegressor for a regression problem.您正在使用DecisionTreeClassifier而不是DecisionTreeRegressor来解决回归问题。
  2. You are removing nans after doing the test train split which will mess up the count of samples.在进行测试训练拆分后,您正在删除nans ,这会弄乱样本数量。 Do the data.dropna() before the split.在拆分之前执行data.dropna()
  3. You are using the model.score(X_test, y_test) incorrectly by passing it (X_test, predictions) .您通过传递它(X_test, predictions)错误地使用了model.score(X_test, y_test) (X_test, predictions) Please use accuracy_score(X_test, predictions) with those parameters instead, or fix the syntax.请使用带有这些参数的accuracy_score(X_test, predictions)代替,或修复语法。
from sklearn.tree import DecisionTreeRegressor #<---- FIRST ISSUE
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


data = pd.read_csv('housing.csv')

data = data.dropna() #<--- SECOND ISSUE

prices = data['median_house_value']
features = data.drop(['median_house_value', 'ocean_proximity'], axis = 1)

X_train, X_test, y_train, y_test = train_test_split(features, prices, test_size=0.2, random_state=42)

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions) #<----- THIRD ISSUE
score

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么我的逻辑回归模型准确率达到 100%? - Why am I getting 100% accuracy for my logistic regression model? 我在所有机器学习模型上都获得了 100% 的准确率。 我的模型有什么问题 - I am getting a 100% accuracy on all my machine learning models. What is wrong with my model 我的 model 是否应该始终在训练数据集上提供 100% 的准确度? - Should my model always give 100% accuracy on Training dataset? 获得100%的训练准确度,但获得60%的测试准确度 - Getting a 100% Training Accuracy, but 60% Testing accuracy 如何检查模型是否正常? 我的模型显示了100%的测试准确度 - How to check whether the model is working fine or not?? My model shows 100% Testing Accuracy 为什么我的逻辑回归 model 的准确率 go 超过 100%? - Why does my accuracy go over 100% on my logistic regression model? 获得精度:0.0000e+00 在我的张量流 model - Getting accuracy: 0.0000e+00 in my Tensor flow model TensorFlow 低级 model(没有 Keras 也没有 Sklearn) - 在每一步都获得损失 = 0 和准确度 = 100% - TensorFlow low level model (no Keras nor Sklearn) - getting loss = 0 and accuracy = 100% at every steps 为什么该模型在 SVM、随机森林分类器和逻辑回归方面获得 100% 的准确率? - Why is the model getting 100% accuracy for SVM, Random-forest Classifier and Logistic Regression? 为什么我的模型在100%精度和60%精度之间转换? - Why do my models shift between 100% accuracy and 60% accuracy?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM