简体   繁体   中英

Linear Regression - Not correct output

I have a database of two columns["A", "B"] where "A" is the input variable and "B" is the target variable. All values are in integers.

My code:

X.shape
>>(2540, 1)

y.shape
>>(2540, 1)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)

import numpy as np
from sklearn.model_selection import train_test_split
np.random.rand(4)
X_train, X_test, y_train, y_test  = train_test_split(X,y,test_size = 0.2)

Linear Regression from Sklearn

regr = LinearRegression(fit_intercept=True)
regr.fit(X_train, y_train)  

print ('Coefficients: ', regr.coef_)
print ('Intercept: ',regr.intercept_)          
>>Coefficients:  [[43.95569425]]
>>Intercept:  [100.68681298]

I got R2 value of 0.93

The last record in X_train is 3687 and the corresponding y_train value is 212.220001

I used the last record for prediction, like

regr.predict([[3687]] )
>>array([161825.22279211])

I do not understand What is happening, I excepted the predicted value will be around 212.

But, The predicted value is 161825

Could you please explain what is the reason, thanks

perhaps you need to pass your test data through the scaler before feeding to the regression. try reg.predict(scaler.transform([3687])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM