Gradient descent method does not work well in linear regression?

Question

In R, I generated some artificial data to perform a linear regression using gradient descent method Y = c0 + c1 * x1 + c2 * x2 + noise

I also use a analytical method to calculate the parameters theta = [c0, c1, c2]. Below is the R codes with notes for the variables.

I use gradient descent method to calculate theta. The formulae are taken from the link below.

slides from a person

slides from Stanford - Andrew Ng

However, the method is unable to converge. My R codes is below. theta is very different from the analytical solution k in the R codes.

rm(list = ls())

n=500
x1=rnorm(n,mean=4,sd=1.6)
x2=rnorm(n,mean=4,sd=2.5)


X=cbind(x1,x2)
A=as.matrix(cbind(rep(1,n),x1,x2))
Y=-3.9+3.8*x1-2.4*x2+rnorm(n,mean=0,sd=1.5);


k=solve(t(A)%*%A,t(A)%*%Y) # k is the parameters determined by analytical method
MSE=sum((A%*%k-Y)^2)/(n);

iterations=3000 # total number of step
epsilon = 0.0001 # set precision
eta=0.0001 # step size

t1=integer(iterations)
e1=integer(iterations)

X=as.matrix(X)# convert data table X into a matrix
N=dim(X)[1] # total number of observations
X=as.matrix(cbind(rep(1,length(N)),X))# add a column of ones to represent intercept
np=dim(X)[2] # number of parameters to be determined
theta=matrix(rnorm(n=np,mean=0,sd=1),1,np) # Initialize theta:1 x np matrix
for(i in 1:iterations){
  error =theta%*%t(X)-t(Y) # error = (theta * x' -Y'). Error is a 1xN row vector;
  grad=(1/N)*error%*%X # Gradient grad is 1 x np vector
  theta=theta-eta*grad # updating theta
  L=sqrt(sum((eta*grad)^2)) # calculating the L2 norm
  e1[i]=sum((error)^2)/(2*N) # record the cost function in each step (value=2*MSE)
  t1[i]=L # record the L2 norm in each step
  if(L<=epsilon){ # checking whether convergence is obtained or not
    break
  }
}

plot(e1*2,type="l",ylab="MSE",lwd=2,col=rgb(0,0,1))
abline(h=MSE)
text(x=1000,y=MSE+1,labels = "Actual MSE",adj=1)
text(x=500,y=15,labels = "Gradient Descent",adj=0.4)
theta
k

Answer 1

I've tried the code and the problem was that after 3000 iterations, the L2 norm L still was greater than teh precision epsilon. I can get reproducible expample data by running set.seed(5556) first. After 7419 iterations L = 9.99938 e-5 < epsilon. However theta still is different from the expected result, which of course for L2 norm and normal data can be calculated running lm(Y ~ X1+X2) .

Answer 2

your problem is that you set your eta very conservative. Hence it takes a very long time to converge. Eta is crucial in convergence speed. If it is chosen to large the algorithm might not converge, however. You will likely here later in the course about algorithm which automatically adjust eta like Adagrad or Adam If you choose eta = 0.001

eta= 0.01

eta=0.1

Gradient descent method does not work well in linear regression?

Question

2 answers

solution1
0 2020-05-02 16:06:57

solution2
0 ACCPTED 2020-05-02 16:52:03

Gradient descent method does not work well in linear regression?

Question

2 answers

solution1 0 2020-05-02 16:06:57

solution2 0 ACCPTED 2020-05-02 16:52:03

solution1
0 2020-05-02 16:06:57

solution2
0 ACCPTED 2020-05-02 16:52:03