简体   繁体   中英

Reproducing OLS prediction Python statsmodel

I've trained a OLS model in Python using statsmodels OLS. With the code below i've trained the model:

import statsmodels.api as sm
X2 = sm.add_constant(lin_x_train)
est = sm.OLS(lin_y_train, X2)
est2 = est.fit()

Using est2.params I obtain the following paramters:

const       -0.394654
pow2         0.920915
eth_36hr    -0.028754
eth_24dhr   -0.068346
eth_16hr     0.064768
eth_72hr     0.001774
eth_48hr     0.001239
eth_24hr     0.026940
eth_2hr     -0.163568
eth_3hr     -0.042497
eth_4hr      0.033180
eth_5hr     -0.029850
eth_6hr     -0.040417

Now I want to predict the following case:

pow2         0
eth_36hr    2.91
eth_24dhr   1.34
eth_16hr    1.13
eth_72hr    13
eth_48hr    6.66
eth_24hr    -9.89
eth_2hr     -3.7
eth_3hr     2.37
eth_4hr     2.36
eth_5hr     -2.28
eth_6hr     -5.27

Since I've trained a OLS model I was under the assumption that it was simply:

y = a + B1 * X1 + B2 *X2 + .... Bn*Xn

When I compute this myself I get a Y value of 0.132 However using:

Xnew = newcase
Xnew = sm.add_constant(Xnew)
est2.predict(Xnew) 

I get a value of 0.699

What am I missing?

Nb using LinearRegression from sklearn I get the same value of 0.699. So I'm clearly missing something. But I can't get my head around it.

What I was missing was indeed quite simple and embarrassing. I switched 2 variable names around, resulting in wrong predictions manually. So, the formula was correct:

y = a + B1 * X1 + B2 *X2 + .... Bn*Xn

Before discovering it, I just worked around by saving the model and importing it to perform the predictions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM