简体   繁体   English

梯度下降

[英]Gradient Descent

So I am writing a program that handles gradient descent. 因此,我正在编写一个处理梯度下降的程序。 Im using this method to solve equations of the form 我用这种方法求解形式的方程

Ax = b
where A is a random 10x10 matrix and b is a random 10x1 matrix

Here is my code: 这是我的代码:

import numpy as np
import math
import random
def steepestDistance(A,b,xO, e):
    xPrev = xO
    dPrev = -((A * xPrev) - b)
    magdPrev = np.linalg.norm(dPrev)
    danger =  np.asscalar(((magdPrev * magdPrev)/(np.dot(dPrev.T,A * dPrev))))
    xNext = xPrev + (danger * dPrev)
    step = 1
    while (np.linalg.norm((A * xNext) - b) >= e and np.linalg.norm((A * xNext) - b) < math.pow(10,4)):
        xPrev = xNext
        dPrev = -((A * xPrev) - b)
        magdPrev = np.linalg.norm(dPrev)
        danger = np.asscalar((math.pow(magdPrev,2))/(np.dot(dPrev.T,A * dPrev)))
        xNext = xPrev + (danger * dPrev)
        step = step + 1
    return xNext

##print(steepestDistance(np.matrix([[5,2],[2,1]]),np.matrix([[1],[1]]),np.matrix([[0.5],[0]]), math.pow(10,-5)))

def chooseRandMatrix():
    matrix = np.zeros(shape = (10,10))
    for i in range(10):
        for a in range(10):
            matrix[i][a] = random.randint(0,100)
    return matrix.T * matrix

def chooseRandColArray():
    arra = np.zeros(shape = (10,1))
    for i in range(10):
        arra[i][0] = random.randint(0,100)
    return arra
for i in range(4): 
  matrix = np.asmatrix(chooseRandMatrix())
  array = np.asmatrix(chooseRandColArray())  
print(steepestDistance(matrix, array, np.asmatrix(chooseRandColArray()),math.pow(10,-5)))

When I run the method steepestDistance on the random matrix and column, I keep getting an infinite loop. 当我在随机矩阵和列上运行方法deepestDistance时,我不断遇到无限循环。 It works fine when simple 2x2 matrices are used for A, but it loops indefinitely for 10x10 matrices. 当简单的2x2矩阵用于A时,它可以正常工作,但对于10x10矩阵,它可以无限循环。 The problem is in np.linalg.norm((A * xNext) - b); 问题出在np.linalg.norm((A * xNext)-b); it keeps growing indefinitely. 它会无限期地增长。 Thats why I put an upper bound on it; 这就是为什么我对它设置上限; Im not supposed to do it for the algorithm however. 我不应该为算法做它。 Can someone tell me what the problem is? 有人可以告诉我问题是什么吗?

Solving a linear system Ax=b with gradient descent means to minimize the quadratic function 用梯度下降法求解线性系统Ax = b以便最小化二次函数

f(x) = 0.5*x^t*A*x - b^t*x. 

This only works if the matrix A is symmetric, A=A^t, since the derivative or gradient of f is 这仅在矩阵A是对称的A = A ^ t时有效,因为f的导数或梯度为

f'(x)^t = 0.5*(A+A^t)*x - b, 

and additionally A must be positive definite. 另外,A必须是正定的。 If there are negative eigenvalues,then the descent will proceed to minus infinity, there is no minimum to be found. 如果特征值为负,则下降将继续为负无穷大,没有最小值可寻。


One work-around is to replace b by A^tb and A by a^t*A, that is to minimize the function 一种解决方法是用A ^ tb替换b并用a ^ t * A替换A,这是为了使函数最小化

f(x) = 0.5*||A*x-b||^2
     = 0.5*x^t*A^t*A*x - b^t*A*x + 0.5*b^t*b

with gradient 带有渐变

f'(x)^t = A^t*A*x - A^t*b

But for large matrices A this is not recommended since the condition number of A^t*A is about the square of the condition number of A. 但是,对于大矩阵A,不建议这样做,因为条件数A ^ t * A大约是条件数A的平方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM