在测试集上使用statsmodels OLS

Question

I would like to use a technique from Scikit Learn, namely the ShuffleSplit to benchmark my linear regression model with a sequence of randomized test and train sets. 我想使用Scikit Learn的一种技术，即ShuffleSplit，通过一系列随机测试和训练集对我的线性回归模型进行基准测试。 This is well established and works great for the LinearModel in Scikit Learn using: 这已经很好地建立了，并且对于Scikit中的LinearModel都非常有效。

from sklearn.linear_model import LinearRegression
LM = LinearRegression()
train_score = LM.score(X[train_index], Y[train_index])
test_score = LM.score(X[test_index], Y[test_index])

The score one gets here is only the R² values and nothing more. 这里得到的分数只是R²值，仅此而已。 Using the statsmodel OLS implementation for linear models gives a very rich set of scores among whcih are adjusted R² and AIC, BIC etc. However here on can only fit the model with the training data to get these scores. 使用statsmodel OLS实施线性模型可以得到非常丰富的分数集，其中包括调整后的R²和AIC，BIC等。但是，此处只能将模型与训练数据拟合才能获得这些分数。 Is there a way to get them also for the test set? 有没有办法让它们也用于测试集？

so in my example: 所以在我的例子中：

from sklearn.model_selection import ShuffleSplit
from statsmodels.regression.linear_model import OLS

ss = ShuffleSplit(n_splits=40, train_size=0.15, random_state=42)
for train_index, test_index in ss.split(X):
    regr = OLS( Y.[train_index], X.[train_index]).fit()
    train_score_AIC = regr.aic

is there a way to add something like 有没有办法添加类似

    test_score_AIC = regr.test(Y.[test_index], X.[test_index]).aic

Answer 1

Most of those measure are goodness of fit measures that are build into the model/results classes and only available for the training data or estimation sample. 这些度量中的大多数是拟合优度，它内置于模型/结果类中，并且仅可用于训练数据或估计样本。 Many of those measures are not well defined for out of sample, predictive accuracy measures, or I have never seen definitions that would fit that case. 对于样本外，预测准确性的度量，其中许多度量没有很好地定义，或者我从未见过适合这种情况的定义。

Specifically, loglike is a method of the model and can only be evaluated at the attached training sample. 具体来说， loglike是模型的一种方法，只能在附加的训练样本上进行评估。

related issues: 相关问题：

https://github.com/statsmodels/statsmodels/issues/2572 https://github.com/statsmodels/statsmodels/issues/1282 https://github.com/statsmodels/statsmodels/issues/2572 https://github.com/statsmodels/statsmodels/issues/1282

It would be possible to partially work around the current limitations of statsmodels but none of those are currently supported and unit tested. 可以部分解决statsmodels当前的局限性，但目前尚不支持并已对其进行单元测试。

在测试集上使用statsmodels OLS

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-04-17 17:09:05

在测试集上使用statsmodels OLS

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-04-17 17:09:05

解决方案1
1 已采纳 2019-04-17 17:09:05