负 SKlearn 线性回归分数

Question

I am trying to build a house price prediction model with sklearn linear regression and I am getting a negative score.我正在尝试使用 sklearn 线性回归构建房价预测模型，但我得到了负分。

Please what am I doing wrong?请问我做错了什么？

dataset:数据集：

this is the dataset这是数据集

Screenshot of Dataset:数据集截图：

Please see below details:请参阅以下详细信息：

Shape of dataframe: (23435, 190)数据框的形状：(23435, 190)

Code:代码：

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score

    properties_five = pd.read_csv('house_test.csv')
    
    X = properties_five.drop('price', axis='columns')
    y = properties_five['price']
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)
    
    lr_clf = LinearRegression()
    lr_clf.fit(X_train, y_train)
    print(lr_clf.score(X_train,y_train))
    print(lr_clf.score(X_test,y_test))
    
    cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
    
    print(cross_val_score(LinearRegression(), X, y, cv=cv))

score on training data: 0.0025884591059242013训练数据得分：0.0025884591059242013

score on test data : -1.6566338615525985e+24测试数据得分：-1.6566338615525985e+24

Answer 1

Your code seems fine - except the line df = pd.read_csv('house_test.csv') should probably be properties_five = pd.read_csv('house_test.csv') to match the next lines.您的代码看起来不错 - 除了df = pd.read_csv('house_test.csv')应该是properties_five = pd.read_csv('house_test.csv')以匹配下一行。

When I run it on this data set , I get the following output:当我在这个数据集上运行它时，我得到以下输出：

0.7307587542204755
0.465770160153375
[0.64358885 0.67211318 0.67817097 0.53631898 0.67390831]

Perhaps the linear regression simply performs poorly on your data set, or else your data set contains errors.也许线性回归只是在您的数据集上表现不佳，或者您的数据集包含错误。 A negative R² score means that you would be better off using "constant regression", that is having your prediction be always equal to the mean of y .负的 R² 分数意味着您最好使用“恒定回归”，即让您的预测始终等于y的平均值。

Answer 2

Please share your outputs.请分享您的输出。 Also linear regression is subject to outliers so you should standardize the numerical variables.线性回归也受到异常值的影响，因此您应该对数值变量进行标准化。

Answer 3

You have read the file using df name, so the very next line you should replace properties_five with df .您已经使用df名称读取了文件，因此您应该在下一行将properties_five替换为df 。 And try to standardize/normalize the dataset, I hope that it will help to reduce error, for example here you can find details .并尝试对数据集进行标准化/规范化，我希望它有助于减少错误，例如在这里您可以找到详细信息。

负 SKlearn 线性回归分数

问题描述

3 个解决方案

解决方案1
2 2020-09-05 22:30:16

解决方案2
0 2020-09-06 08:21:22

解决方案3
0 2020-09-06 12:50:29

负 SKlearn 线性回归分数

问题描述

3 个解决方案

解决方案1 2 2020-09-05 22:30:16

解决方案2 0 2020-09-06 08:21:22

解决方案3 0 2020-09-06 12:50:29

解决方案1
2 2020-09-05 22:30:16

解决方案2
0 2020-09-06 08:21:22

解决方案3
0 2020-09-06 12:50:29