简体繁体 English

python 中的回归 OLS

[英]regression OLS in python

原文 2022-02-17 08:22:10 6 1 python/ regression/ least-squares

I have some questions about multiple regression models in python:我对python中的多元回归模型有一些疑问：

Why is it necessary to apply a “dummy intercept” vector of ones to start for the Least Square method (OLS)?为什么有必要应用一个“虚拟截距”向量来开始最小二乘法 (OLS)？ (I am refering to the use of X = sm.add_constant(X). I know that the Least square method is a system of derivatives equal to zero. Is it computed with some iterative method that make a “dummy intercept” necessary? Where can I find some informative material about the detail of the algorithm est = sm.OLS(y, X).fit()? （我指的是 X = sm.add_constant(X) 的使用。我知道最小二乘法是一个等于零的导数系统。它是用某种迭代方法计算的，需要“虚拟截距”吗？哪里我能找到一些关于算法 est = sm.OLS(y, X).fit() 细节的资料吗？
As far as I understood, scale.fit_transform produce a normalization of the data.据我了解， scale.fit_transform 产生数据的规范化。 Usually a normalization do not produce value higher than 1. Why, once scaled I see value that exceed 1?通常归一化不会产生高于 1 的值。为什么在缩放后我会看到超过 1 的值？
Where is it possible to find a official documentation about python functions?在哪里可以找到有关 python 函数的官方文档？

Thanks in advance提前致谢

1 个解决方案

In the OLS the function you are trying to fit is: y=ax1+ax2+ax3+c.在 OLS 中，您要拟合的 function 是：y=ax1+ax2+ax3+c。 if you don't use c term, your line will always pass through the origin.如果您不使用 c 术语，您的线将始终通过原点。 Hence to give more degrees of freedom to your line which can be offset by c from your origin you need c.因此，为了给你的线更多的自由度，它可以从你的原点偏移 c，你需要 c。

You can fit a line without constant term and you will get set of coefficients (dummy intercept is not necessary for iterative computation), but that might not be the best possible straight line which minimises the least square.您可以拟合一条没有常数项的直线，您将获得一组系数（迭代计算不需要虚拟截距），但这可能不是最小化最小二乘的最佳直线。