简体   繁体   English

通过Householder成功找到Q&R之后,无法获得R中的线性回归系数

[英]Unable to get Linear Regression Cofficients in R after Successfully finding Q & R via Householder

I am manually trying to calculate the regression coefficients rather than using any default for data http://people.sc.fsu.edu/~jburkardt/datasets/regression/x31.txt 我正在手动尝试计算回归系数,而不是对数据使用任何默认值http://people.sc.fsu.edu/~jburkardt/datasets/regression/x31.txt

Here is my code, which properly produces Q&R satisfying A=QR. 这是我的代码,可以正确产生满足A = QR的Q&R。 But I am not able to find coefficients as dimensions of Q & R creating issues. 但是我找不到系数作为问与答产生问题的维度。 Can any experts help me here? 有什么专家可以帮助我吗? When I have proper Q & R how finding coefficients can go wrong? 当我有适当的问与答时,如何找到系数可能会出错?

library(xlsx)
data.df<-read.xlsx("regression.xlsx",2,header = F)
#Remove unneccesary Index position
data.df<-data.df[2:5]

#Decomposing
#coefficients [b]=inv(t(X)(Matrix Multiplication)(X))(Matrix Multiplication)t(X)(Matrix Multiplication)y
Y<-as.matrix(data.df[4])
#But note that if you need to express Y=Xb, the coefficient of b_0 must be X_0 
#which is 1
X_0<-as.data.frame(c(1,1,1,1,1,1,1,1,1,1))
X<-(cbind(X_0,data.df[1:3]))
names(X)<-c("X1","X2","X3","X4")
X<-as.matrix(X)
#Create copy for final evaluvations
A<-X  


x1<-as.matrix(X[,1])
deter_x<-sqrt(sum(x1^2))
n=dim(x1)[1]
deter_e1<-as.matrix(c(deter_x,rep(0,n-1)))
v1=x1+deter_e1
#c_i=2/(transpose(v_i)%*%(v_i))
c1<-as.numeric(2/(t(v1)%*%v1))
#H_i = I - c_i*v_i%*%transpose(v_i)
I<-diag(n)
H1<-I-c1*(v1%*%t(v1))
R1<-H1%*%X
#Check R1 and see if it is Upper Triangle Matrix
R1
#We will take rest of the interesting portion of matrix R1.
n=dim(R1)[1]
X<-as.matrix(as.data.frame(cbind(R1[2:n,2],R1[2:n,3],R1[2:n,4])))
x1<-as.matrix(X[,1])
deter_x<-sqrt(sum(x1^2))
n=dim(x1)[1]
deter_e1<-as.matrix(c(deter_x,rep(0,n-1)))
v1=x1-deter_e1
#c_i=2/(transpose(v_i)%*%(v_i))
c1<-as.numeric(2/(t(v1)%*%v1))
#H_i = I - c_i*v_i%*%transpose(v_i)
I<-diag(n)
H2<-I-c1*(v1%*%t(v1))
R2<-H2%*%X

#Check R2 and see if it is Upper Triangle Matrix, if no go for R3
n=dim(R2)[1]
X<-as.matrix(as.data.frame(cbind(R2[2:n,2],R2[2:n,3])))
x1<-as.matrix(X[,1])
deter_x<-sqrt(sum(x1^2))
n=dim(x1)[1]
deter_e1<-as.matrix(c(deter_x,rep(0,n-1)))
v1=x1+deter_e1
#c_i=2/(transpose(v_i)%*%(v_i))
c1<-as.numeric(2/(t(v1)%*%v1))
#H_i = I - c_i*v_i%*%transpose(v_i)
I<-diag(n)
H3<-I-c1*(v1%*%t(v1))
R3<-H3%*%X
R3

#Check R3 and see if it is Upper Triangle Matrix, if no go for R4
n=dim(R3)[1]
X<-as.matrix(as.data.frame(cbind(R3[2:n,2])))
x1<-as.matrix(X[,1])
deter_x<-sqrt(sum(x1^2))
n=dim(x1)[1]
deter_e1<-as.matrix(c(deter_x,rep(0,n-1)))
v1=x1-deter_e1
#c_i=2/(transpose(v_i)%*%(v_i))
c1<-as.numeric(2/(t(v1)%*%v1))
#H_i = I - c_i*v_i%*%transpose(v_i)
I<-diag(n)
H4<-I-c1*(v1%*%t(v1))
R4<-H4%*%X
R4
#As we can see R4 has all values except first element as zero
#Let us replace Matrices iteratively in R1 from R2 to R4 and round it of
R1[2:10,2:4]<-R2
R1[3:10,3:4]<-R3
R1[4:10,4]<-R4
R<-round(R1,5)
R
#Find Complete H1
#Q=H1%*%H2%*%H3%*%H4
H1_COM<-H1
# 
H_temp<-diag(10)
n=dim(H_temp)[1]
dim(H_temp[2:n,2:n])
dim(H2)
H_temp[2:n,2:n]<-H2
H2_COM<-H_temp
H2_COM
H_temp<-diag(10)
n=dim(H_temp)[1]
dim(H_temp[3:n,3:n])
dim(H3)
H_temp[3:n,3:n]<-H3
H3_COM<-H_temp
H3_COM

H_temp<-diag(10)
n=dim(H_temp)[1]
dim(H_temp[4:n,4:n])
dim(H4)
H_temp[4:n,4:n]<-H4
H4_COM<-H_temp
Q=H1_COM%*%H2_COM%*%H3_COM%*%H4_COM
# The following code properly reconstructs A Matrix proving proper Q & R
A=round(Q%*%R)
# When you try to find coefficients using Q&R you will end up in error.
solve(R)%*%t(Q)%*%Y
#Error in solve.default(R) : 'a' (10 x 4) must be square
#So trying to get matrix R without all 0 rows R[1:4,1:4]
solve(R[1:4,1:4])%*%t(Q)%*%Y
#Error in solve(R[1:4, 1:4]) %*% t(Q) : non-conformable arguments
dim(solve(R[1:4,1:4]))
dim(solve(R[1:4,1:4]))
#4 4
dim(t(Q))
#[1] 10 10
dim(Y)
#10  1

I would like to point you to this thread which I have answered (rather comprehensively): Multiple regression analysis in R using QR decomposition . 我想向您指出这个线程(我已经回答了(相当全面)): R中使用QR分解进行多元回归分析 Whether your question will be seen as a duplicate will be judged by the community. 您的问题是否会被视为重复项,将由社区进行判断。 Note, among the options there, the last one is using a QR factorization written in plain R code myself. 请注意,在其中的选项中,最后一个选项是使用我自己用纯R代码编写的QR因式分解。

Given that your toy code for QR factorization is correct (as you commented along your code), the major issue is with your last few lines. 鉴于您用于QR因式分解的玩具代码是正确的(正如您在代码中所做的评论一样),主要问题在于最后几行。

The solution is simple: 解决方案很简单:

solve(R) %*% (t(Q) %*% Y)[1:4,]

1:4 selects the first 4 elements in the single-column matrix t(Q) %*% Y . 1:4选择单列矩阵t(Q) %*% Y的前4个元素。

If you check with my linked answer, you will see that instead of solve , I am using backsolve as this is a triangular system of equations. 如果您查看我的链接答案,您会发现它不是solve ,而是使用backsolve因为这是一个三角方程组。

You will also spot that I use crossprod when I can instead of t and %*% . 您还将发现,我可以使用crossprod代替t%*% My recent answer When is 'crossprod' preferred to '%*%', and when isn't? 我最近的回答什么时候将'crossprod'首选为'%*%',什么时候不? has a comprehensive discussion on these two fashions. 对这两种方式进行了全面的讨论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM