返回StatsModel中样本外预测的标准和置信区间

Question

I'd like to find the standard deviation and confidence intervals for an out-of-sample prediction from an OLS model. 我想从OLS模型中找到样本外预测的标准偏差和置信区间。

This question is similar to Confidence intervals for model prediction , but with an explicit focus on using out-of-sample data. 此问题类似于模型预测的置信区间，但明确侧重于使用样本外数据。

The idea would be for a function along the lines of wls_prediction_std(lm, data_to_use_for_prediction=out_of_sample_df) , that returns the prstd, iv_l, iv_u for that out of sample dataframe. 这个想法是针对wls_prediction_std(lm, data_to_use_for_prediction=out_of_sample_df)的prstd, iv_l, iv_u它返回样本数据帧之外的prstd, iv_l, iv_u 。

For instance: 例如：

import pandas as pd
import random
import statsmodels.formula.api as smf
from statsmodels.sandbox.regression.predstd import wls_prediction_std

df = pd.DataFrame({"y":[x for x in range(10)],
                   "x1":[(x*5 + random.random() * 2) for x in range(10)],
                    "x2":[(x*2.1 + random.random()) for x in range(10)]})

out_of_sample_df = pd.DataFrame({"x1":[(x*3 + random.random() * 2) for x in range(10)],
                                 "x2":[(x + random.random()) for x in range(10)]})

formula_string = "y ~ x1 + x2"
lm = smf.ols(formula=formula_string, data=df).fit()

# Prediction works fine:
print(lm.predict(out_of_sample_df))

# I can also get std and CI for in-sample data:
prstd, iv_l, iv_u = wls_prediction_std(lm)
print(prstd)

# I cannot figure out how to get std and CI for out-of-sample data:
try:
    print(wls_prediction_std(lm, exog= out_of_sample_df))
except ValueError as e:
    print(str(e))
    #returns "ValueError: wrong shape of exog"

# trying to concatenate the DFs:
df_both = pd.concat([df, out_of_sample_df],
                    ignore_index = True)

# Only returns results for the data from df, not from out_of_sample_df
lm2 = smf.ols(formula=formula_string, data=df_both).fit()
prstd2, iv_l2, iv_u2 = wls_prediction_std(lm2)
print(prstd2)

Answer 1

It looks like the problem is in the format of the exog parameter. 看起来问题是exog参数的格式。 This method is 100% stolen from this workaround by github user thatneat . github用户可以通过此解决方法100％窃取此方法。 It is necessary because of this bug. 因为这个bug是必要的。

def transform_exog_to_model(fit, exog):
    transform=True
    self=fit

    # The following is lifted straight from statsmodels.base.model.Results.predict()
    if transform and hasattr(self.model, 'formula') and exog is not None:
        from patsy import dmatrix
        exog = dmatrix(self.model.data.orig_exog.design_info.builder,
                       exog)

    if exog is not None:
        exog = np.asarray(exog)
        if exog.ndim == 1 and (self.model.exog.ndim == 1 or
                               self.model.exog.shape[1] == 1):
            exog = exog[:, None]
        exog = np.atleast_2d(exog)  # needed in count model shape[1]

    # end lifted code
    return exog

transformed_exog = transform_exog_to_model(lm, out_of_sample_df)
print(transformed_exog)
prstd2, iv_l2, iv_u2 = wls_prediction_std(lm, transformed_exog, weights=[1])
print(prstd2)

Answer 2

Additionally you can try to use the get_prediction method. 此外，您可以尝试使用get_prediction方法。

predictions = result.get_prediction(out_of_sample_df)
predictions.summary_frame(alpha=0.05)

This returns the confidence and prediction interval. 这将返回置信度和预测间隔。 I found the summary_frame() method buried here and you can find the get_prediction() method here . 我发现summary_frame（）方法埋在这里，你可以找到get_prediction（）方法在这里。 You can change the significance level of the confidence interval and prediction interval by modifying the "alpha" parameter. 您可以通过修改“alpha”参数来更改置信区间和预测区间的显着性级别。

返回StatsModel中样本外预测的标准和置信区间

问题描述

2 个解决方案

解决方案1
6 已采纳 2015-09-15 20:15:09

解决方案2
1 2017-11-09 00:15:11

返回StatsModel中样本外预测的标准和置信区间

问题描述

2 个解决方案

解决方案1 6 已采纳 2015-09-15 20:15:09

解决方案2 1 2017-11-09 00:15:11

解决方案1
6 已采纳 2015-09-15 20:15:09

解决方案2
1 2017-11-09 00:15:11