简体   繁体   中英

Statsmodels - Wald Test for significance of trend in coefficients in Linear Regression Model (OLS)

I have used Statsmodels to generate a OLS linear regression model to predict a dependent variable based on about 10 independent variables. The independent variables are all categorical.

I am interested in looking closer at the significance of the coefficients for one of the independent variables. There are 4 categories, so 3 coefficients -- each of which are highly significant. I would also like to look at the significance of the trend across all 3 categories. From my (limited) understanding, this is often done using a Wald Test and comparing all of the coefficients to 0.

How exactly is this done using Statsmodels? I see there is a Wald Test method for the OLS function. It seems you have to pass in values for all of the coefficients when using this method.

My approach was the following...

First, here are all of the coefficients:

np.array(lm.params) = array([ 0.21538725,  0.05675108,  0.05020252,  0.08112228,  0.00074715,
        0.03886747,  0.00981819,  0.19907263,  0.13962354,  0.0491201 ,
       -0.00531318,  0.00242845, -0.0097336 , -0.00143791, -0.01939182,
       -0.02676771,  0.01649944,  0.01240742, -0.00245309,  0.00757727,
        0.00655152, -0.02895381, -0.02027537,  0.02621716,  0.00783884,
        0.05065323,  0.04264466, -0.13068456, -0.15694931, -0.25518566,
       -0.0308599 , -0.00558183,  0.02990139,  0.02433505, -0.01582824,
       -0.00027538,  0.03170669,  0.01130944,  0.02631403])

I am only interested in params 2-4 (which are the 3 coefficients of interest).

coeffs = np.zeros_like(lm.params)
coeffs = coeffs[1:4] = [0.05675108,  0.05020252,  0.08112228]

Checking to make sure this worked:

array([ 0.        ,  0.05675108,  0.05020252,  0.08112228,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ])

Looks good, now to run in the test!

lm.wald_test(coeffs) = 
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=array([[ 13.11493673]]), p=0.000304699208434, df_denom=1248, df_num=1>

Is this the correct approach? I could really use some help!

A linear hypothesis has the form R params = q where R is the matrix that defines the linear combination of parameters and q is the hypothesized value.

In the simple case where we want to test whether some parameters are zero, the R matrix has a 1 in the column corresponding to the position of the parameter and zeros everywhere else, and q is zero, which is the default. Each row specifies a linear combination of parameters, which defines a hypothesis as part of the overall or joint hypothesis.

In this case, the simplest way to get the restriction matrix is by using the corresponding rows of an identity matrix

R = np.eye(len(results.params))[1:4]

Then, lm.wald_test(R) will provide the test for the joint hypothesis that the 3 parameters are zero.

A simpler way to specify the restriction is by using the names of the parameters and defining the restrictions by a list of strings.

The model result classes also have a new method wald_test_terms which automatically generates the wald tests for terms in the design matrix where the hypothesis includes several parameters or columns, as in the case of categorical explanatory variables or of polynomial explanatory variables. This is available in statsmodels master and will be in the upcoming 0.7 release.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM