简体   繁体   中英

Using statsmodels OLS on a test-set

I would like to use a technique from Scikit Learn, namely the ShuffleSplit to benchmark my linear regression model with a sequence of randomized test and train sets. This is well established and works great for the LinearModel in Scikit Learn using:

from sklearn.linear_model import LinearRegression
LM = LinearRegression()
train_score = LM.score(X[train_index], Y[train_index])
test_score = LM.score(X[test_index], Y[test_index])

The score one gets here is only the R² values and nothing more. Using the statsmodel OLS implementation for linear models gives a very rich set of scores among whcih are adjusted R² and AIC, BIC etc. However here on can only fit the model with the training data to get these scores. Is there a way to get them also for the test set?

so in my example:

from sklearn.model_selection import ShuffleSplit
from statsmodels.regression.linear_model import OLS

ss = ShuffleSplit(n_splits=40, train_size=0.15, random_state=42)
for train_index, test_index in ss.split(X):
    regr = OLS( Y.[train_index], X.[train_index]).fit()
    train_score_AIC = regr.aic

is there a way to add something like

    test_score_AIC = regr.test(Y.[test_index], X.[test_index]).aic

Most of those measure are goodness of fit measures that are build into the model/results classes and only available for the training data or estimation sample. Many of those measures are not well defined for out of sample, predictive accuracy measures, or I have never seen definitions that would fit that case.

Specifically, loglike is a method of the model and can only be evaluated at the attached training sample.

related issues:

https://github.com/statsmodels/statsmodels/issues/2572 https://github.com/statsmodels/statsmodels/issues/1282

It would be possible to partially work around the current limitations of statsmodels but none of those are currently supported and unit tested.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM