简体   繁体   中英

Negative SKlearn linear regression score

I am trying to build a house price prediction model with sklearn linear regression and I am getting a negative score.

Please what am I doing wrong?

dataset:

this is the dataset

Screenshot of Dataset: 在此处输入图片说明

Please see below details:

Shape of dataframe: (23435, 190)

Code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score

    properties_five = pd.read_csv('house_test.csv')
    
    X = properties_five.drop('price', axis='columns')
    y = properties_five['price']
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)
    
    lr_clf = LinearRegression()
    lr_clf.fit(X_train, y_train)
    print(lr_clf.score(X_train,y_train))
    print(lr_clf.score(X_test,y_test))
    
    cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
    
    print(cross_val_score(LinearRegression(), X, y, cv=cv))

score on training data: 0.0025884591059242013

score on test data : -1.6566338615525985e+24

Your code seems fine - except the line df = pd.read_csv('house_test.csv') should probably be properties_five = pd.read_csv('house_test.csv') to match the next lines.

When I run it on this data set , I get the following output:

0.7307587542204755
0.465770160153375
[0.64358885 0.67211318 0.67817097 0.53631898 0.67390831]

Perhaps the linear regression simply performs poorly on your data set, or else your data set contains errors. A negative R² score means that you would be better off using "constant regression", that is having your prediction be always equal to the mean of y .

Please share your outputs. Also linear regression is subject to outliers so you should standardize the numerical variables.

You have read the file using df name, so the very next line you should replace properties_five with df . And try to standardize/normalize the dataset, I hope that it will help to reduce error, for example here you can find details .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM