简体   繁体   English

python中的多元线性回归而不适合原点?

[英]Multiple linear regression in python without fitting the origin?

I found this chunk of code on http://rosettacode.org/wiki/Multiple_regression#Python , which does a multiple linear regression in python. 我在http://rosettacode.org/wiki/Multiple_regression#Python上找到了这段代码,该代码在python中进行了多元线性回归。 Print b in the following code gives you the coefficients of x1, ..., xN. 以下代码中的打印b为您提供x1,...,xN的系数。 However, this code is fitting the line through the origin (ie the resulting model does not include a constant). 但是,此代码适合通过原点的线(即,结果模型不包含常数)。

All I'd like to do is the exact same thing except I do not want to fit the line through the origin, I need the constant in my resulting model. 我只想做完全相同的事情,除了我不想使线穿过原点,我需要在结果模型中使用常量。

Any idea if it's a small modification to do this? 知道这是否是一个小的修改吗? I've searched and found numerous documents on multiple regressions in python, except they are lengthy and overly complicated for what I need. 我已经搜索并找到了许多有关python中多个回归的文档,除了它们冗长且过于复杂之外,这对于我所需要的东西而言是不小的。 This code works perfect, except I just need a model that fits through the intercept not the origin. 这段代码非常完美,除了我只需要一个适合截距而不是原点的模型。

import numpy as np
from numpy.random import random

n=100
k=10
y = np.mat(random((1,n)))
X = np.mat(random((k,n)))

b = y * X.T * np.linalg.inv(X*X.T)
print(b)

Any help would be appreciated. 任何帮助,将不胜感激。 Thanks. 谢谢。

您只需要向X添加全为1的行即可。

Maybe a more stable approach would be to use a least squares algorithm anyway. 也许更稳定的方法还是使用最小二乘算法。 This can also be done in numpy in a few lines. 这也可以在numpy中完成几行。 Read the documentation about numpy.linalg.lstsq . 阅读有关numpy.linalg.lstsq文档

Here you can find an example implementation: 在这里,您可以找到示例实现:

http://glowingpython.blogspot.de/2012/03/linear-regression-with-numpy.html http://glowingpython.blogspot.de/2012/03/linear-regression-with-numpy.html

What you have written out, b = y * XT * np.linalg.inv(X * XT) , is the solution to the normal equations, which gives the least squares fit with a multi-linear model. 您所写的b = y * XT * np.linalg.inv(X * XT)是法线方程的解决方案,它为多线性模型提供了最小二乘拟合。 swang's response is correct (and EMS's elaboration)---you need to add a row of 1's to X. If you want some idea of why it works theoretically, keep in mind that you are finding b_i such that swang的响应是正确的(以及EMS的阐述)---您需要在X上加上1。如果您想了解其理论上的工作原理,请记住您正在寻找b_i使得

y_j = sum_i b_i x_{ij}.

By adding a row of 1's, you are are setting x_{(k+1)j} = 1 for all j , which means that you are finding b_i such that: 通过添加一行1,可以为所有j设置x_{(k+1)j} = 1 ,这意味着您将找到b_i,使得:

y_j = (sum_i b_i x_{ij}) + b_{k+1}

because the k+1 st x_ij term is always equal to one. 因为k+1 st x_ij项始终等于1。 Thus, b_{k+1} is your intercept term. 因此, b_{k+1}是您的截取项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM