I used the statsmodels
package to estimate my OLS regression. Now I want Breusch Pagan test
. I used the pysal
package for this test but this function returns an error:
import statsmodels.api as sm
import pysal
model = sm.OLS(Y,X,missing = 'drop')
rs = model.fit()
pysal.spreg.diagnostics.breusch_pagan(rs)
returned error:
AttributeError: 'OLSResults' object has no attribute 'u'
What should I do?
The problem is that the regression results instance of statsmodels is not compatible with the one in pysal.
You can use breuschpagan
from statsmodels, which takes OLS residuals and candidates for explanatory variables for the heteroscedasticity and so it does not rely on a specific model or implementation of a model.
documentation: https://www.statsmodels.org/devel/generated/statsmodels.stats.diagnostic.het_breuschpagan.html
with examples here https://www.statsmodels.org/devel/examples/notebooks/generated/regression_diagnostics.html
I do not know if there are any essential differences in the implementation of the Breusch-Pagan test.
It looks like the name is misspelled in statsmodels.
edit The spelling of the name has been corrected in statsmodels version 0.9. The old incorrect spelling was breushpagan
.
As I tend not to use the statsmodels library, I have created a Python function to perform the Breusch-Pagan test. It uses multiple linear regression from SciKit-learn.
import numpy as np
from sklearn.linear_model import LinearRegression
from scipy.stats import chisqprob
def breusch_pagan_test(x, y):
'''
Breusch-Pagan test for heteroskedasticity in a linear regression model:
H_0 = No heteroskedasticity.
H_1 = Heteroskedasticity is present.
Inputs:
x = a numpy.ndarray containing the predictor variables. Shape = (nSamples, nPredictors).
y = a 1D numpy.ndarray containing the response variable. Shape = (nSamples, ).
Outputs a list containing three elements:
1. the Breusch-Pagan test statistic.
2. the p-value for the test.
3. the test result.
'''
if y.ndim != 1:
raise SystemExit('Error: y has more than 1 dimension.')
if x.shape[0] != y.shape[0]:
raise SystemExit('Error: the number of samples differs between x and y.')
else:
n_samples = y.shape[0]
# fit an OLS linear model to y using x:
lm = LinearRegression()
lm.fit(x, y)
# calculate the squared errors:
err = (y - lm.predict(x))**2
# fit an auxiliary regression to the squared errors:
# why?: to estimate the variance in err explained by x
lm.fit(x, err)
pred_err = lm.predict(x)
del lm
# calculate the coefficient of determination:
ss_tot = sum((err - np.mean(err))**2)
ss_res = sum((err - pred_err)**2)
r2 = 1 - (ss_res / ss_tot)
del err, pred_err, ss_res, ss_tot
# calculate the Lagrange multiplier:
LM = n_samples * r2
del r2
# calculate p-value. degrees of freedom = number of predictors.
# this is equivalent to (p - 1) parameter restrictions in Wikipedia entry.
pval = chisqprob(LM, x.shape[1])
if pval < 0.01:
test_result = 'Heteroskedasticity present at 99% CI.'
elif pval < 0.05:
test_result = 'Heteroskedasticity present at 95% CI.'
else:
test_result = 'No significant heteroskedasticity.'
return [LM, pval, test_result]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.