简体   繁体   English

负 SKlearn 线性回归分数

[英]Negative SKlearn linear regression score

I am trying to build a house price prediction model with sklearn linear regression and I am getting a negative score.我正在尝试使用 sklearn 线性回归构建房价预测模型,但我得到了负分。

Please what am I doing wrong?请问我做错了什么?

dataset:数据集:

this is the dataset这是数据集

Screenshot of Dataset:数据集截图: 在此处输入图片说明

Please see below details:请参阅以下详细信息:

Shape of dataframe: (23435, 190)数据框的形状:(23435, 190)

Code:代码:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score

    properties_five = pd.read_csv('house_test.csv')
    
    X = properties_five.drop('price', axis='columns')
    y = properties_five['price']
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)
    
    lr_clf = LinearRegression()
    lr_clf.fit(X_train, y_train)
    print(lr_clf.score(X_train,y_train))
    print(lr_clf.score(X_test,y_test))
    
    cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
    
    print(cross_val_score(LinearRegression(), X, y, cv=cv))

score on training data: 0.0025884591059242013训练数据得分:0.0025884591059242013

score on test data : -1.6566338615525985e+24测试数据得分:-1.6566338615525985e+24

Your code seems fine - except the line df = pd.read_csv('house_test.csv') should probably be properties_five = pd.read_csv('house_test.csv') to match the next lines.您的代码看起来不错 - 除了df = pd.read_csv('house_test.csv')应该是properties_five = pd.read_csv('house_test.csv')以匹配下一行。

When I run it on this data set , I get the following output:当我在这个数据集上运行它时,我得到以下输出:

0.7307587542204755
0.465770160153375
[0.64358885 0.67211318 0.67817097 0.53631898 0.67390831]

Perhaps the linear regression simply performs poorly on your data set, or else your data set contains errors.也许线性回归只是在您的数据集上表现不佳,或者您的数据集包含错误。 A negative R² score means that you would be better off using "constant regression", that is having your prediction be always equal to the mean of y .负的 R² 分数意味着您最好使用“恒定回归”,即让您的预测始终等于y的平均值。

Please share your outputs.请分享您的输出。 Also linear regression is subject to outliers so you should standardize the numerical variables.线性回归也受到异常值的影响,因此您应该对数值变量进行标准化。

You have read the file using df name, so the very next line you should replace properties_five with df .您已经使用df名称读取了文件,因此您应该在下一行将properties_five替换为df And try to standardize/normalize the dataset, I hope that it will help to reduce error, for example here you can find details .并尝试对数据集进行标准化/规范化,我希望它有助于减少错误, 例如在这里您可以找到详细信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM