
[英]Accuracy of multivariate classification and regression models with Scikit-Learn
[英]Negative accuracy score in regression models with Scikit-Learn
我写了一个预测房价的代码。 问题是,我得到了负的准确度分数。 我使用了 5 种不同的算法,准确率得分无处不在。
我遇到的第一个问题是我在使用.map
函数时收到警告,但我认为这不是问题。
回归模型有效,但它们的训练和测试准确度却无处不在。 我也试过这个:
from sklearn.metrics import accuracy_score ... score_train = regression.accuracy_score(variables_train, result_train) ...
但它向我展示了这个 AttributeError: 'LinearRegression' 对象没有属性 'accuracy_score'
您可以从这里下载数据库:
https://www.sendspace.com/file/93nkdy
这是代码:
import pandas as pd
from sklearn import linear_model
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
#pandas display options
pd.set_option('display.max_rows', 70)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)
data = pd.read_csv("validate.csv")
data = data.drop(columns = ["id"])
data = data.dropna(axis='columns')
data_for_pred = data[["bedrooms_total", "baths_total",
"sq_ft_tot_fn", "garage_capacity",
"city", "total_stories", "rooms_total",
"garage", "flood_zone","price_closed"]]
#to see how many different values I have
cities = data_for_pred['city'].unique()
garage = data_for_pred['garage'].unique()
flood_zone = data_for_pred['flood_zone'].unique()
#mapping so that I can do my regression
data_for_pred['city'] = data_for_pred['city'].map({'Woodstock': 1, 'Barnard': 2, 'Pomfret': 3})
data_for_pred['garage'] = data_for_pred['garage'].map({'No': 0, 'Yes': 1})
data_for_pred['flood_zone'] = data_for_pred['flood_zone'].map({'Unknown': 0, 'Yes': 1, 'No': -1})
#print(data_for_pred)
def regression_model(bedrooms_num, baths_num, sq_ft_tot, garage_cap,
city, total_stor, rooms_tot, garage, flood_zone):
classifiers = [
["Linear regression", linear_model.LinearRegression()],
["Support vector regression", SVR(gamma = 'auto')],
["Decision tree regression", DecisionTreeRegressor()],
["SVR - RBF", SVR(kernel = "rbf", C = 1e3, gamma = 0.1)],
["SVR - Linear regression", SVR(kernel = "linear", C = 1e0)]]
variables = data_for_pred.iloc[:,:-1]
results = data_for_pred.iloc[:,-1]
predictionData = [bedrooms_num, baths_num, sq_ft_tot, garage_cap, city,
total_stor, rooms_tot, garage, flood_zone]
info = ""
for item in classifiers:
regression = item[1]
variables_train, variables_test, result_train, result_test = train_test_split(variables, results , test_size = 0.2, random_state = 4)
regression.fit(variables_train, result_train)
#Prediction
prediction = regression.predict([predictionData])
prediction = round(prediction[0], 2)
#Accuracy of prediction
score_train = regression.score(variables_train, result_train)
score_train = round(score_train*100, 2)
score_test = regression.score(variables_test, result_test)
score_test = round(score_test*100, 2)
info += str(item[0]) + " prediction: " + str(prediction) + " | Train accuracy: " + str(score_train) + "% | Test accuracy: " + str(score_test) + "%\n"
return info
print(regression_model(7, 8, 4506, 0, 1, 2.00, 15, 0, 0)) #true value 375000
print(regression_model(8, 8, 5506, 0, 1, 2.00, 15, 0, 0)) #true value more then 375000
准确度是针对分类问题定义的。 在这里,您有一个回归问题。
所述.score
所述的方法LinearRegression
返回预测不准确的确定R ^ 2的系数。
score(self,X,y [,sample_weight])返回预测的确定系数R ^ 2。
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
编辑
如果您预测标签(分类问题),则可以使用此标签 。
from sklearn.metrics import accuracy_score
scores_classification = accuracy_score(result_train, prediction)
如果您预测标量值(回归问题)-这是您的情况 ,则应使用回归度量,例如:
scores_regr = metrics.mean_squared_error(y_true, y_pred)
所有回归评分方法都在这里: https : //scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
编辑2
采用:
score_train = mean_squared_error(result_train, prediction)
这个答案是解决LinearRegression()和mean_squared_error()指标。
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
variables_train, variables_test, result_train, result_test = sklearn.model_selection.train_test_split(variables, results, test_size=0.2)
regression = linear_model.LinearRegression()
regression.fit(variables_train, result_train)
prediction = regression.predict(variables_test)
mean_squared_score = mean_squared_error(result_test, prediction)
print("MEAN SQUARED SCORE", mean_squared_score)
请注意:我留下了一些问题的细节,我认为这对使用回归模型评估指标的问题没有影响。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.