Statsmodels OLS函数可用于多个回归参数

Question

Lets say I want to find the alpha (a) values for an equation which has something like 可以说我想找到一个方程的alpha（a）值

y=a+ax1+ax2+...+axi

Using OLS lets say we start with 10 values for the basic case of i=2 使用OLS可以说，对于i = 2的基本情况，我们从10个值开始

#y=a+ax1+ax2

y = np.arange(1, 10)
x = np.array([[ 5, 10], [10,  5], [ 5, 15],
       [15, 20], [20, 25], [25, 30],[30, 35],
       [35,  5], [ 5, 10], [10, 15]])

Using statsmodel I would generally the following code to obtain the roots of nx1 x and y array: 使用statsmodel我通常会使用以下代码来获取nx1 x和y数组的根：

import numpy as np
import statsmodels.api as sm

X = sm.add_constant(x)

# least squares fit
model = sm.OLS(y, X)
fit = model.fit()
alpha=fit.params

But this does not work when x is not equivalent to y. 但这在x不等于y时不起作用。 The equation is here on the first page if you do not know what OLS. 该公式是这里的第一页上，如果你不知道什么OLS。

Answer 1

The traceback tells you what's wrong 追溯信息告诉您出了什么问题

    raise ValueError("endog and exog matrices are different sizes")
ValueError: endog and exog matrices are different sizes

Your x has 10 values, your y has 9 values. 您的x有10个值，您的y有9个值。 A regression only works if both have the same number of observations. 仅当两者具有相同数量的观察值时，回归才有效。

endog is y and exog is x, those are the names used in statsmodels for the independent and the explanatory variables. endog是y，exog是x，这些是statsmodels中用于自变量和解释变量的名称。

If you replace your y by 如果您将y替换为

y = np.arange(1, 11)

then everything works as expected. 然后一切都会按预期进行。

Answer 2

Here's the basic problem with the above, you say you're using 10 items, but you're only using 9 for your vector of y's. 这是上面的基本问题，您说您使用的是10个项目，但您的y的向量仅使用了9个项目。

>>> import numpy
>>> len(numpy.arange(1, 10))
9

This is because slices and ranges in Python go up to but not including the stop integer. 这是因为Python中的分片和范围最多但不包括终止整数。 If you had done: 如果您已完成：

numpy.arange(10)

you would have had a list of 10 items, starting at 0, and ending with 9. 您将有10个项目的清单，从0开始，以9结尾。

For a regression, you require a predicted variable for every set of predictors. 对于回归，您需要为每组预测变量使用一个预测变量。 Otherwise, the predictors are useless. 否则，预测变量将毫无用处。 You may as well discard the set of predictors that do not have a predicted variable to go with them. 您也可以丢弃没有预测变量的一组预测变量。

Statsmodels OLS函数可用于多个回归参数

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-01-20 15:22:31

解决方案2
0 2014-01-20 16:28:37

Statsmodels OLS函数可用于多个回归参数

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-01-20 15:22:31

解决方案2 0 2014-01-20 16:28:37

解决方案1
1 已采纳 2014-01-20 15:22:31

解决方案2
0 2014-01-20 16:28:37