简体   繁体   English

Statsmodels OLS函数可用于多个回归参数

[英]Statsmodels OLS function for multiple regression parameters

Lets say I want to find the alpha (a) values for an equation which has something like 可以说我想找到一个方程的alpha(a)值

y=a+ax1+ax2+...+axi

Using OLS lets say we start with 10 values for the basic case of i=2 使用OLS可以说,对于i = 2的基本情况,我们从10个值开始

#y=a+ax1+ax2

y = np.arange(1, 10)
x = np.array([[ 5, 10], [10,  5], [ 5, 15],
       [15, 20], [20, 25], [25, 30],[30, 35],
       [35,  5], [ 5, 10], [10, 15]])

Using statsmodel I would generally the following code to obtain the roots of nx1 x and y array: 使用statsmodel我通常会使用以下代码来获取nx1 x和y数组的根:

import numpy as np
import statsmodels.api as sm

X = sm.add_constant(x)

# least squares fit
model = sm.OLS(y, X)
fit = model.fit()
alpha=fit.params

But this does not work when x is not equivalent to y. 但这在x不等于y时不起作用。 The equation is here on the first page if you do not know what OLS. 该公式是这里的第一页上,如果你不知道什么OLS。

The traceback tells you what's wrong 追溯信息告诉您出了什么问题

    raise ValueError("endog and exog matrices are different sizes")
ValueError: endog and exog matrices are different sizes

Your x has 10 values, your y has 9 values. 您的x有10个值,您的y有9个值。 A regression only works if both have the same number of observations. 仅当两者具有相同数量的观察值时,回归才有效。

endog is y and exog is x, those are the names used in statsmodels for the independent and the explanatory variables. endog是y,exog是x,这些是statsmodels中用于自变量和解释变量的名称。

If you replace your y by 如果您将y替换为

y = np.arange(1, 11)

then everything works as expected. 然后一切都会按预期进行。

Here's the basic problem with the above, you say you're using 10 items, but you're only using 9 for your vector of y's. 这是上面的基本问题,您说您使用的是10个项目,但您的y的向量仅使用了9个项目。

>>> import numpy
>>> len(numpy.arange(1, 10))
9

This is because slices and ranges in Python go up to but not including the stop integer. 这是因为Python中的分片和范围最多但不包括终止整数。 If you had done: 如果您已完成:

numpy.arange(10)

you would have had a list of 10 items, starting at 0, and ending with 9. 您将有10个项目的清单,从0开始,以9结尾。

For a regression, you require a predicted variable for every set of predictors. 对于回归,您需要为每组预测变量使用一个预测变量。 Otherwise, the predictors are useless. 否则,预测变量将毫无用处。 You may as well discard the set of predictors that do not have a predicted variable to go with them. 您也可以丢弃没有预测变量的一组预测变量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM