简体   繁体   中英

How to apply statsmodels.stats.diagnostic.compare_j test for linear and log-linear models

I'd like to apply statsmodels.stats.diagnostic.compare_j test for linear and log-linear models. The linear model formula is

Sale_Price ~ Overall_Qual + Gr_Liv_Area + Neighborhood + MS_SubClass + Bsmt_Exposure + Roof_Matl + Misc_Feature + Overall_Cond + Year_Built + Bsmt_Full_Bath + Total_Bsmt_SF + 1.

Log-linear model formula is

np.log(Sale_Price) ~ Overall_Qual + Gr_Liv_Area + Neighborhood + MS_SubClass + Bsmt_Exposure + Roof_Matl + Misc_Feature + Overall_Cond + Year_Built + Bsmt_Full_Bath + Total_Bsmt_SF + 1

(same features, but np.log(Sale_Price) instead of Sale_Price ).

When I run the test I get an error

ValueError: endogenous variables in models are not the same

Is it possible to compare linear and log-linear models using this method? And does it make any sense or no model is superior in this case? Because if I try a workaround

log_model.model.endog = np.exp(log_model.model.endog)

I get

ValueError: The exog in results_x and in results_z are nested. J comparison requires that models are non-nested.

I can't tell if you are using a data frame, you need to create a new column with the log Sale_Price and regress using that:

df['log_Sale_Price'] = np.log(df['Sale_Price'])
mod = smf.ols(formula='log_Sale_Price ~ Overall_Qual + Gr_Liv_Area..', data=df)

As for your second question, you should not use statsmodels.stats.diagnostic.compare_j because the dependent variables are on different scales. This function should be implementing the J test in R , so according to the manual:

The J test statistic is simply the marginal test of the fitted values in the augmented model.

Since your predicted values from the log model would be on a different scale as the non logged, this will not work.

If I understood your question, you want to see whether log transformation of your dependent variable gives a better fit.

The primary reason for transforming the dependent variable is to ensure the residues follow more closely, a gaussian distribution. You can simply plot the residues versus the predicted values to check this relationship, for example in this example . Also you can apply the Breusch-Pagan test and check whether it improves with the log transformation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM