简体   繁体   中英

Remove const or label variables using statsmodels.api

I'm doing a multi-variable regression using statsmodels.api

model = sm.regression.linear_model.OLS(dependent, X)
results = model.fit()
summary = results.summary()

where dependent is a vector of length n and X is a matrix of dimention mxn, where m is the number of factors.

Each component of X is a row vector whose first entry is the data label and next n entries are the data itself:

["revenue", 123,456,789.........514]

printing the summary gives:

                           OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.993
Model:                            OLS   Adj. R-squared:                  0.987
Method:                 Least Squares   F-statistic:                     159.7
Date:                Fri, 25 Oct 2013   Prob (F-statistic):           1.99e-31
Time:                        12:14:19   Log-Likelihood:                -730.93
No. Observations:                  71   AIC:                             1530.
Df Residuals:                      37   BIC:                             1607.
Df Model:                          34                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const        699.6533    421.414      1.660      0.105      -154.212  1553.519
x1           131.5725    266.202      0.494      0.624      -407.803   670.948
x2         -5186.9570   1.04e+04     -0.499      0.621     -2.63e+04  1.59e+04
x3          2.897e+04   1.51e+04      1.925      0.062     -1525.292  5.95e+04
x4             0.7279      0.373      1.950      0.059        -0.029     1.484
x5         -2.794e+05   4.41e+05     -0.634      0.530     -1.17e+06  6.14e+05
x6         -2500.4833   1533.499     -1.631      0.111     -5607.647   606.680
x7          2.202e+04   1.71e+04      1.290      0.205     -1.26e+04  5.66e+04
x8             5.9603      2.597      2.296      0.027         0.699    11.221
x9          -1.41e+07   1.04e+07     -1.354      0.184     -3.52e+07  7.01e+06
x10           -0.3980      0.561     -0.710      0.482        -1.534     0.738
x11         8.862e+04    8.4e+04      1.055      0.298     -8.16e+04  2.59e+05
x12         6.851e+04   4.81e+04      1.426      0.162     -2.89e+04  1.66e+05
x13         1.189e+08   7.23e+07      1.645      0.108     -2.75e+07  2.65e+08
x14         -531.5723    688.333     -0.772      0.445     -1926.268   863.123
x15          290.7228   9702.296      0.030      0.976     -1.94e+04  1.99e+04
x16        -4316.1159   1235.718     -3.493      0.001     -6819.919 -1812.313
x17           -1.0480     18.339     -0.057      0.955       -38.206    36.110
x18            0.4967      1.108      0.448      0.657        -1.749     2.743
x19         -512.3132    680.352     -0.753      0.456     -1890.838   866.211
x20        -6.174e+05   4.15e+05     -1.489      0.145     -1.46e+06  2.23e+05
x21          -20.1921      9.588     -2.106      0.042       -39.620    -0.764
x22        -1109.1907    868.787     -1.277      0.210     -2869.520   651.139
x23        -3.275e-05   1.74e-05     -1.888      0.067     -6.79e-05  2.41e-06
x24        -3.046e+04   1.87e+04     -1.630      0.112     -6.83e+04  7396.892
x25        -8255.2473   4228.299     -1.952      0.058     -1.68e+04   312.100
x26           -0.4144      0.165     -2.515      0.016        -0.748    -0.081
x27        -3.779e+07   2.33e+07     -1.622      0.113      -8.5e+07  9.43e+06
x28         -672.3038   9934.991     -0.068      0.946     -2.08e+04  1.95e+04
x29         1.271e+05   4.71e+04      2.696      0.010      3.16e+04  2.23e+05
x30           11.2359      5.247      2.141      0.039         0.604    21.868
x31         -2.58e+05   8.63e+05     -0.299      0.767     -2.01e+06  1.49e+06
x32        -5.362e+04   2.66e+04     -2.014      0.051     -1.08e+05   318.991
x33           11.7349      6.720      1.746      0.089        -1.880    25.350
x34         -1.71e+06   1.25e+07     -0.137      0.892     -2.71e+07  2.37e+07
x35           -7.6490      8.019     -0.954      0.346       -23.897     8.600
x36          291.4046    178.169      1.636      0.110       -69.601   652.410
x37          510.0672    318.445      1.602      0.118      -135.164  1155.298
==============================================================================
Omnibus:                        3.382   Durbin-Watson:                   1.864
Prob(Omnibus):                  0.184   Jarque-Bera (JB):                2.615
Skew:                          -0.441   Prob(JB):                        0.271
Kurtosis:                       3.324   Cond. No.                          nan
==============================================================================

print results.params gives:

[  6.99653265e+02   1.31572465e+02  -5.18695704e+03   2.89725201e+04
   7.27866154e-01  -2.79412892e+05  -2.50048329e+03   2.20188260e+04
   5.96032414e+00  -1.40983228e+07  -3.98040736e-01   8.86220943e+04
   6.85055661e+04   1.18927196e+08  -5.31572322e+02   2.90722839e+02
  -4.31611590e+03  -1.04803807e+00   4.96741935e-01  -5.12313204e+02
  -6.17414913e+05  -2.01921161e+01  -1.10919070e+03  -3.27489243e-05
  -3.04625838e+04  -8.25524731e+03  -4.14444321e-01  -3.77917370e+07
  -6.72303755e+02   1.27068811e+05   1.12359266e+01  -2.57978901e+05
  -5.36154172e+04   1.17349174e+01  -1.71045966e+06  -7.64895526e+00
   2.91404563e+02   5.10067167e+02]

where the first entry 699.6533 is the coefficient corresponding to the constant term etc. all the way up to x37.

My problem is that the position of the const term in the summary can be in different places (not necessarily the first position). And I need a way to either a) label each factor with the label in the first position on the vector OR b) a way to always identify which entry in the summary (and hence in the params) corresponds to the const term.

I would like to do this without using an additional package like pandas.

Please help.

Thank you!

You can find the constant position in

results.model.data.const_idx

If you do use pandas, then you could do

results.params['const']

But without it, you'll have to rely on

results.params[results.model.data.const_idx]

You can overwrite the default names for the parameters either for the model or in summary:

model.data.xnames = my_xnames

or

results.summary(xname=my_xnames)

When a model is used in the pure numpy version, then names for the variables are created with a default pattern. The array of explanatory variables, X, is also checked for the presence of a constant. I thought this should set the default names for const to the correct line in the summary, but there might be a missing connection between detecting the constant and creating the parameter names.

update

The default name creation checks whether there is a column with variance equal to zero and assign the 'const' label

def _make_exog_names(exog):
    exog_var = exog.var(0)
    if (exog_var == 0).any():
        # assumes one constant in first or last position
        # avoid exception if more than one constant
        const_idx = exog_var.argmin()
        exog_names = ['x%d' % i for i in range(1,exog.shape[1])]
        exog_names.insert(const_idx, 'const')

This name creation is independent of the const_idx attribute, there is no connection made.

If you have an example where the location of the constant is not identified correctly this way, then you should open an issue with statsmodels on github.

Changing the names doesn't have a nice setter methd but model.data.xnames works for me

>>> res_olsg.model.exog_names
['x1', 'x2', 'const']
>>> res_olsg.model.data.xnames
['x1', 'x2', 'const']

changing xnames

>>> res_olsg.model.data.xnames = ['x1', 'x2', 'not a const']
>>> res_olsg.model.exog_names
['x1', 'x2', 'not a const']

exog_names is read only

>>> res_olsg.model.exog_names= ['x1', 'x2', 'x3']
Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    res_olsg.model.exog_names= ['x1', 'x2', 'x3']
AttributeError: can't set attribute


>>> print res_olsg.summary()
                            OLS Regression Results                            
...                                     
===============================================================================
                  coef    std err          t      P>|t|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------
x1              4.3742      0.215     20.374      0.000         3.951     4.798
x2             -0.6140      0.285     -2.157      0.032        -1.175    -0.053
not a const    -9.4817      1.068     -8.874      0.000       -11.589    -7.375
==============================================================================
...

got it:

def const_i():
    x = m.model.data.xnames
    for i in x:
        if i == "const":
            return x.index(i)

returns the index of the constant term in results.params()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM