简体   繁体   中英

OLS Breusch Pagan test in Python

I used the statsmodels package to estimate my OLS regression. Now I want Breusch Pagan test . I used the pysal package for this test but this function returns an error:

import statsmodels.api as sm
import pysal

model = sm.OLS(Y,X,missing = 'drop')
rs = model.fit()
pysal.spreg.diagnostics.breusch_pagan(rs)

returned error:

AttributeError: 'OLSResults' object has no attribute 'u'

What should I do?

The problem is that the regression results instance of statsmodels is not compatible with the one in pysal.

You can use breuschpagan from statsmodels, which takes OLS residuals and candidates for explanatory variables for the heteroscedasticity and so it does not rely on a specific model or implementation of a model.

documentation: https://www.statsmodels.org/devel/generated/statsmodels.stats.diagnostic.het_breuschpagan.html

with examples here https://www.statsmodels.org/devel/examples/notebooks/generated/regression_diagnostics.html

I do not know if there are any essential differences in the implementation of the Breusch-Pagan test.

It looks like the name is misspelled in statsmodels.

edit The spelling of the name has been corrected in statsmodels version 0.9. The old incorrect spelling was breushpagan .

As I tend not to use the statsmodels library, I have created a Python function to perform the Breusch-Pagan test. It uses multiple linear regression from SciKit-learn.

import numpy as np
from sklearn.linear_model import LinearRegression
from scipy.stats import chisqprob


def breusch_pagan_test(x, y):
    '''
    Breusch-Pagan test for heteroskedasticity in a linear regression model:
    H_0 = No heteroskedasticity.
    H_1 = Heteroskedasticity is present.

    Inputs:
    x = a numpy.ndarray containing the predictor variables. Shape = (nSamples, nPredictors).
    y = a 1D numpy.ndarray containing the response variable. Shape = (nSamples, ).

    Outputs a list containing three elements:
    1. the Breusch-Pagan test statistic.
    2. the p-value for the test.
    3. the test result.
    '''

    if y.ndim != 1:
        raise SystemExit('Error: y has more than 1 dimension.')
    if x.shape[0] != y.shape[0]:
        raise SystemExit('Error: the number of samples differs between x and y.')
    else:
        n_samples = y.shape[0]

    # fit an OLS linear model to y using x:
    lm = LinearRegression()
    lm.fit(x, y)

    # calculate the squared errors:
    err = (y - lm.predict(x))**2

    # fit an auxiliary regression to the squared errors:
    # why?: to estimate the variance in err explained by x
    lm.fit(x, err)
    pred_err = lm.predict(x)
    del lm

    # calculate the coefficient of determination:
    ss_tot = sum((err - np.mean(err))**2)
    ss_res = sum((err - pred_err)**2)
    r2 = 1 - (ss_res / ss_tot)
    del err, pred_err, ss_res, ss_tot

    # calculate the Lagrange multiplier:
    LM = n_samples * r2
    del r2

    # calculate p-value. degrees of freedom = number of predictors.
    # this is equivalent to (p - 1) parameter restrictions in Wikipedia entry.
    pval = chisqprob(LM, x.shape[1])

    if pval < 0.01:
        test_result = 'Heteroskedasticity present at 99% CI.'
    elif pval < 0.05:
        test_result = 'Heteroskedasticity present at 95% CI.'
    else:
        test_result = 'No significant heteroskedasticity.'
    return [LM, pval, test_result]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM