I'm doing a multi-variable regression using statsmodels.api
model = sm.regression.linear_model.OLS(dependent, X)
results = model.fit()
summary = results.summary()
where dependent is a vector of length n and X is a matrix of dimention mxn, where m is the number of factors.
Each component of X is a row vector whose first entry is the data label and next n entries are the data itself:
["revenue", 123,456,789.........514]
printing the summary gives:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.993
Model: OLS Adj. R-squared: 0.987
Method: Least Squares F-statistic: 159.7
Date: Fri, 25 Oct 2013 Prob (F-statistic): 1.99e-31
Time: 12:14:19 Log-Likelihood: -730.93
No. Observations: 71 AIC: 1530.
Df Residuals: 37 BIC: 1607.
Df Model: 34
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 699.6533 421.414 1.660 0.105 -154.212 1553.519
x1 131.5725 266.202 0.494 0.624 -407.803 670.948
x2 -5186.9570 1.04e+04 -0.499 0.621 -2.63e+04 1.59e+04
x3 2.897e+04 1.51e+04 1.925 0.062 -1525.292 5.95e+04
x4 0.7279 0.373 1.950 0.059 -0.029 1.484
x5 -2.794e+05 4.41e+05 -0.634 0.530 -1.17e+06 6.14e+05
x6 -2500.4833 1533.499 -1.631 0.111 -5607.647 606.680
x7 2.202e+04 1.71e+04 1.290 0.205 -1.26e+04 5.66e+04
x8 5.9603 2.597 2.296 0.027 0.699 11.221
x9 -1.41e+07 1.04e+07 -1.354 0.184 -3.52e+07 7.01e+06
x10 -0.3980 0.561 -0.710 0.482 -1.534 0.738
x11 8.862e+04 8.4e+04 1.055 0.298 -8.16e+04 2.59e+05
x12 6.851e+04 4.81e+04 1.426 0.162 -2.89e+04 1.66e+05
x13 1.189e+08 7.23e+07 1.645 0.108 -2.75e+07 2.65e+08
x14 -531.5723 688.333 -0.772 0.445 -1926.268 863.123
x15 290.7228 9702.296 0.030 0.976 -1.94e+04 1.99e+04
x16 -4316.1159 1235.718 -3.493 0.001 -6819.919 -1812.313
x17 -1.0480 18.339 -0.057 0.955 -38.206 36.110
x18 0.4967 1.108 0.448 0.657 -1.749 2.743
x19 -512.3132 680.352 -0.753 0.456 -1890.838 866.211
x20 -6.174e+05 4.15e+05 -1.489 0.145 -1.46e+06 2.23e+05
x21 -20.1921 9.588 -2.106 0.042 -39.620 -0.764
x22 -1109.1907 868.787 -1.277 0.210 -2869.520 651.139
x23 -3.275e-05 1.74e-05 -1.888 0.067 -6.79e-05 2.41e-06
x24 -3.046e+04 1.87e+04 -1.630 0.112 -6.83e+04 7396.892
x25 -8255.2473 4228.299 -1.952 0.058 -1.68e+04 312.100
x26 -0.4144 0.165 -2.515 0.016 -0.748 -0.081
x27 -3.779e+07 2.33e+07 -1.622 0.113 -8.5e+07 9.43e+06
x28 -672.3038 9934.991 -0.068 0.946 -2.08e+04 1.95e+04
x29 1.271e+05 4.71e+04 2.696 0.010 3.16e+04 2.23e+05
x30 11.2359 5.247 2.141 0.039 0.604 21.868
x31 -2.58e+05 8.63e+05 -0.299 0.767 -2.01e+06 1.49e+06
x32 -5.362e+04 2.66e+04 -2.014 0.051 -1.08e+05 318.991
x33 11.7349 6.720 1.746 0.089 -1.880 25.350
x34 -1.71e+06 1.25e+07 -0.137 0.892 -2.71e+07 2.37e+07
x35 -7.6490 8.019 -0.954 0.346 -23.897 8.600
x36 291.4046 178.169 1.636 0.110 -69.601 652.410
x37 510.0672 318.445 1.602 0.118 -135.164 1155.298
==============================================================================
Omnibus: 3.382 Durbin-Watson: 1.864
Prob(Omnibus): 0.184 Jarque-Bera (JB): 2.615
Skew: -0.441 Prob(JB): 0.271
Kurtosis: 3.324 Cond. No. nan
==============================================================================
print results.params gives:
[ 6.99653265e+02 1.31572465e+02 -5.18695704e+03 2.89725201e+04
7.27866154e-01 -2.79412892e+05 -2.50048329e+03 2.20188260e+04
5.96032414e+00 -1.40983228e+07 -3.98040736e-01 8.86220943e+04
6.85055661e+04 1.18927196e+08 -5.31572322e+02 2.90722839e+02
-4.31611590e+03 -1.04803807e+00 4.96741935e-01 -5.12313204e+02
-6.17414913e+05 -2.01921161e+01 -1.10919070e+03 -3.27489243e-05
-3.04625838e+04 -8.25524731e+03 -4.14444321e-01 -3.77917370e+07
-6.72303755e+02 1.27068811e+05 1.12359266e+01 -2.57978901e+05
-5.36154172e+04 1.17349174e+01 -1.71045966e+06 -7.64895526e+00
2.91404563e+02 5.10067167e+02]
where the first entry 699.6533 is the coefficient corresponding to the constant term etc. all the way up to x37.
My problem is that the position of the const term in the summary can be in different places (not necessarily the first position). And I need a way to either a) label each factor with the label in the first position on the vector OR b) a way to always identify which entry in the summary (and hence in the params) corresponds to the const term.
I would like to do this without using an additional package like pandas.
Please help.
Thank you!
You can find the constant position in
results.model.data.const_idx
If you do use pandas, then you could do
results.params['const']
But without it, you'll have to rely on
results.params[results.model.data.const_idx]
You can overwrite the default names for the parameters either for the model or in summary:
model.data.xnames = my_xnames
or
results.summary(xname=my_xnames)
When a model is used in the pure numpy version, then names for the variables are created with a default pattern. The array of explanatory variables, X, is also checked for the presence of a constant. I thought this should set the default names for const
to the correct line in the summary, but there might be a missing connection between detecting the constant and creating the parameter names.
update
The default name creation checks whether there is a column with variance equal to zero and assign the 'const' label
def _make_exog_names(exog):
exog_var = exog.var(0)
if (exog_var == 0).any():
# assumes one constant in first or last position
# avoid exception if more than one constant
const_idx = exog_var.argmin()
exog_names = ['x%d' % i for i in range(1,exog.shape[1])]
exog_names.insert(const_idx, 'const')
This name creation is independent of the const_idx attribute, there is no connection made.
If you have an example where the location of the constant is not identified correctly this way, then you should open an issue with statsmodels on github.
Changing the names doesn't have a nice setter methd but model.data.xnames
works for me
>>> res_olsg.model.exog_names
['x1', 'x2', 'const']
>>> res_olsg.model.data.xnames
['x1', 'x2', 'const']
changing xnames
>>> res_olsg.model.data.xnames = ['x1', 'x2', 'not a const']
>>> res_olsg.model.exog_names
['x1', 'x2', 'not a const']
exog_names is read only
>>> res_olsg.model.exog_names= ['x1', 'x2', 'x3']
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
res_olsg.model.exog_names= ['x1', 'x2', 'x3']
AttributeError: can't set attribute
>>> print res_olsg.summary()
OLS Regression Results
...
===============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-------------------------------------------------------------------------------
x1 4.3742 0.215 20.374 0.000 3.951 4.798
x2 -0.6140 0.285 -2.157 0.032 -1.175 -0.053
not a const -9.4817 1.068 -8.874 0.000 -11.589 -7.375
==============================================================================
...
got it:
def const_i():
x = m.model.data.xnames
for i in x:
if i == "const":
return x.index(i)
returns the index of the constant term in results.params()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.