Reproducing OLS prediction Python statsmodel

Question

I've trained a OLS model in Python using statsmodels OLS. With the code below i've trained the model:

import statsmodels.api as sm
X2 = sm.add_constant(lin_x_train)
est = sm.OLS(lin_y_train, X2)
est2 = est.fit()

Using est2.params I obtain the following paramters:

const       -0.394654
pow2         0.920915
eth_36hr    -0.028754
eth_24dhr   -0.068346
eth_16hr     0.064768
eth_72hr     0.001774
eth_48hr     0.001239
eth_24hr     0.026940
eth_2hr     -0.163568
eth_3hr     -0.042497
eth_4hr      0.033180
eth_5hr     -0.029850
eth_6hr     -0.040417

Now I want to predict the following case:

pow2         0
eth_36hr    2.91
eth_24dhr   1.34
eth_16hr    1.13
eth_72hr    13
eth_48hr    6.66
eth_24hr    -9.89
eth_2hr     -3.7
eth_3hr     2.37
eth_4hr     2.36
eth_5hr     -2.28
eth_6hr     -5.27

Since I've trained a OLS model I was under the assumption that it was simply:

y = a + B1 * X1 + B2 *X2 + .... Bn*Xn

When I compute this myself I get a Y value of 0.132 However using:

Xnew = newcase
Xnew = sm.add_constant(Xnew)
est2.predict(Xnew)

I get a value of 0.699

What am I missing?

Nb using LinearRegression from sklearn I get the same value of 0.699. So I'm clearly missing something. But I can't get my head around it.

Answer 1

What I was missing was indeed quite simple and embarrassing. I switched 2 variable names around, resulting in wrong predictions manually. So, the formula was correct:

y = a + B1 * X1 + B2 *X2 + .... Bn*Xn

Before discovering it, I just worked around by saving the model and importing it to perform the predictions.

Reproducing OLS prediction Python statsmodel

Question

1 answers

solution1
0 ACCPTED 2018-06-15 09:00:59

Reproducing OLS prediction Python statsmodel

Question

1 answers

solution1 0 ACCPTED 2018-06-15 09:00:59

solution1
0 ACCPTED 2018-06-15 09:00:59