简体   繁体   English

线性回归系数

[英]Linear Regression Coefficients

I am currently using statsmodels (although I would also be happy to use Scikit) to create a linear regression.我目前正在使用 statsmodels(尽管我也很乐意使用 Scikit)来创建线性回归。 On this particular model I am finding that when adding more than one factor to the model, the OLS algorithm spits out wild coefficients.在这个特定的 model 上,我发现当向 model 添加多个因子时,OLS 算法会吐出狂野的系数。 These coefficients are both extremely high and low, which seems to optimise the algorithm by averaging out.这些系数既非常高又非常低,这似乎通过平均来优化算法。 It results in all of the factors being statistically insignificant.它导致所有因素在统计上不显着。 I am just wondering if there is a way that I can put an upper or lower limit on the coefficients such that the OLS has to optimize within these new boundaries?我只是想知道是否有一种方法可以对系数设置上限或下限,以便 OLS 必须在这些新边界内进行优化?

I don't know if you can set a condition to OLS such that the absolute value of the coefficients are all less than a constant.我不知道你是否可以为 OLS 设置一个条件,使得系数的绝对值都小于一个常数。

Regularization is a good alternative to this kind of problem though.不过,正则化是解决这类问题的一个很好的选择。 Basically, L1 or L2 regularization penalize the sum of the coefficients in the optimization function, which pushes the coefficients of the least significant variables close to zero so they don't raise the value of the cost function.基本上,L1 或 L2 正则化会惩罚优化 function 中的系数总和,这会将最不重要的变量的系数推向接近零,因此它们不会提高成本 function 的值。

Take a look at lasso , ridge and elastic net regression.看看lassoridgeelastic net回归。 They use L1, L2 and both forms of regularization respectively.他们分别使用正则化的 L1、L2 和 forms。

You can try the following in statsmodels:您可以在 statsmodels 中尝试以下操作:

# Import OLS
from statsmodels.regression.linear_model import OLS

# Initialize model
reg = OLS(endog=y, exog=X)

# Fit model
reg = reg.fit_regularized()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM