[英]Getting 100% Accuracy on my DecisionTree Model
這是我的代碼,無論測試規模有多大,它始終返回 100% 的准確率。 我使用了 train_test_split 方法,所以我認為不應該有任何重復的數據。 有人可以檢查我的代碼嗎?
from sklearn.tree import DecisionTreeClassifier
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = pd.read_csv('housing.csv')
prices = data['median_house_value']
features = data.drop(['median_house_value', 'ocean_proximity'], axis = 1)
prices.shape
(20640,)
features.shape
(20640, 8)
X_train, X_test, y_train, y_test = train_test_split(features, prices, test_size=0.2, random_state=42)
X_train = X_train.dropna()
y_train = y_train.dropna()
X_test = X_test.dropna()
y_test = X_test.dropna()
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_train.shape
(16512,)
X_train.shape
(16512, 8)
predictions = model.predict(X_test)
score = model.score(y_test, predictions)
score
編輯:自從我發現多個問題以來,我已經重新設計了我的答案。 請復制粘貼以下代碼以確保不留下任何錯誤。
問題 -
DecisionTreeClassifier
而不是DecisionTreeRegressor
來解決回歸問題。nans
,這會弄亂樣本數量。 在拆分之前執行data.dropna()
。(X_test, predictions)
錯誤地使用了model.score(X_test, y_test)
(X_test, predictions)
。 請使用帶有這些參數的accuracy_score(X_test, predictions)
代替,或修復語法。from sklearn.tree import DecisionTreeRegressor #<---- FIRST ISSUE
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = pd.read_csv('housing.csv')
data = data.dropna() #<--- SECOND ISSUE
prices = data['median_house_value']
features = data.drop(['median_house_value', 'ocean_proximity'], axis = 1)
X_train, X_test, y_train, y_test = train_test_split(features, prices, test_size=0.2, random_state=42)
model = DecisionTreeRegressor()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions) #<----- THIRD ISSUE
score
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.