简体   繁体   English

使用带有 statsmodels 的 OLS 模型预测值

[英]Predicting values using an OLS model with statsmodels

I calculated a model using OLS (multiple linear regression).我使用 OLS(多元线性回归)计算了一个模型。 I divided my data to train and test (half each), and then I would like to predict values for the 2nd half of the labels.我将数据划分为训练和测试(各一半),然后我想预测第二半标签的值。

model = OLS(labels[:half], data[:half])
predictions = model.predict(data[half:])

The problem is that I get and error: File "/usr/local/lib/python2.7/dist-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/regression/linear_model.py", line 281, in predict return np.dot(exog, params) ValueError: matrices are not aligned问题是我得到错误:文件“/usr/local/lib/python2.7/dist-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/regression/linear_model.py” ,第 281 行,在预测返回 np.dot(exog, params) ValueError: 矩阵未对齐

I have the following array shapes: data.shape: (426, 215) labels.shape: (426,)我有以下数组形状: data.shape: (426, 215) labels.shape: (426,)

If I transpose the input to model.predict, I do get a result but with a shape of (426,213), so I suppose its wrong as well (I expect one vector of 213 numbers as label predictions):如果我将输入转换为 model.predict,我确实得到了一个结果,但形状为 (426,213),所以我认为它也是错误的(我希望一个包含 213 个数字的向量作为标签预测):

model.predict(data[half:].T)

Any idea how to get it to work?知道如何让它工作吗?

For statsmodels >=0.4, if I remember correctly 对于statsmodels> = 0.4,如果我没记错的话

model.predict doesn't know about the parameters, and requires them in the call see http://statsmodels.sourceforge.net/stable/generated/statsmodels.regression.linear_model.OLS.predict.html model.predict不知道参数,并在调用中要求它们参见http://statsmodels.sourceforge.net/stable/generated/statsmodels.regression.linear_model.OLS.predict.html

What should work in your case is to fit the model and then use the predict method of the results instance. 在您的情况下应该适用的是拟合模型,然后使用结果实例的预测方法。

model = OLS(labels[:half], data[:half])
results = model.fit()
predictions = results.predict(data[half:])

or shorter 或更短

results = OLS(labels[:half], data[:half]).fit()
predictions = results.predict(data[half:])

http://statsmodels.sourceforge.net/stable/generated/statsmodels.regression.linear_model.RegressionResults.predict.html with missing docstring http://statsmodels.sourceforge.net/stable/generated/statsmodels.regression.linear_model.RegressionResults.predict.html缺少文档字符串

Note: this has been changed in the development version (backwards compatible), that can take advantage of "formula" information in predict http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.RegressionResults.predict.html 注意:这在开发版本(向后兼容)中已经更改,可以利用预测http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.RegressionResults.predict中的“公式”信息。 HTML

You can also call get_prediction method of the Results object to get the prediction together with its error estimate and confidence intervals.您还可以调用Results对象的get_prediction方法来获取预测及其误差估计和置信区间。 Example:例子:

import numpy as np
import statsmodels.api as sm

X = np.array([0, 1, 2, 3])
y = np.array([1, 2, 3.5, 4])
X = sm.add_constant(X)
model = sm.OLS(y, X)
results = model.fit()

predict:预测:

# Predict at x=2.5
X_test = np.array([1, 2.5])  # "1" refers to the intercept term
results.get_prediction(X_test).summary_frame(alpha=0.05)  # alpha = significance level for confidence interval

gives:给出:

    mean    mean_se mean_ci_lower   mean_ci_upper   obs_ci_lower    obs_ci_upper
0   3.675   0.198431    2.821219    4.528781    2.142416    5.207584

where mean_ci refers to the confidence interval and obs_ci refers to the prediction interval .其中mean_ci是指置信区间obs_ci是指预测区间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM