简体   繁体   English

fmin_cg在scipy中的梯度函数

[英]gradient function of fmin_cg in scipy

I am trying to use the conjugate gradient algorithm (fmin_cg) from scipy to find parameters theta which give the best fit within a linear model. 我正在尝试使用来自scipy的共轭梯度算法(fmin_cg)来找到在线性模型中能提供最佳拟合的参数theta。

Data file HouseData.csv (eg house area, house price): 数据文件HouseData.csv(例如房屋面积,房屋价格):

120, 250
200, 467
250, 500
1200, 2598
1500, 3000

The code is: 代码是:

from scipy import optimize
import numpy as np

data=np.genfromtxt('HouseData.csv',delimiter=',')
X=np.c_[np.ones(len(data)),data[:,:-1]]
Y=data[:,[-1]]

def cost_Function(theta):
    theta1=theta[np.newaxis].T
    #print('theta: ',theta1)
    cost = Y-np.dot(X, theta1)
    return (cost*cost).sum()

# Gradient Function
def gradf(theta):
    theta1 = theta[np.newaxis].T
    cost = Y - np.dot(X, theta1)
    #print('cost*X.sum(0) is', np.sum(cost*X,axis=0))
    return np.sum(cost*X,axis=0)


x0 = np.asarray((0,1)) #initial guess
result = optimize.fmin_cg(cost_Function,x0,fprime=gradf)
print(result)    

Without fprime=gradf the code returns the correct result, but what is the problem with the gradient function? 如果没有fprime = gradf,代码将返回正确的结果,但是渐变函数有什么问题? When including it as above, the algorithm returns exactly the input for theta. 当如上所述包含它时,算法将精确返回theta的输入。 Is there anything else you would implement differently to improve performance? 您还有其他方法可以提高性能吗? This is just a simple example but the algorithms should also run with X having many columns and rows. 这只是一个简单的示例,但是算法也应该在具有许多列和行的X上运行。

(python 3.5.1, scipy and numpy most recent version) (python 3.5.1,scipy和numpy最新版本)

Your gradient is clearly wrong. 您的渐变显然是错误的。

Since your cost function is quadratic, we can approximate the gradient reasonably well with: gradf(x) = (f(x + eps) - f(x - eps)) / (2 eps) . 由于您的成本函数是二次函数,因此我们可以用gradf(x) = (f(x + eps) - f(x - eps)) / (2 eps)很好地近似梯度。 Let's try that: 让我们尝试一下:

e0 = np.array([1, 0])
e1 = np.array([0, 1])
eps = 1e-5

x0 = np.array([1, 1])

df_yours = gradf(x0)
# array([  3.54000000e+03,   4.05583000e+06])

df_approx = np.array([
    cost_Function(x0 + eps*e0) - cost_Function(x0 - eps*e0),
    cost_Function(x0 + eps*e1) - cost_Function(x0 - eps*e1)
]) / (2 * eps)
# array([ -7.07999999e+03,  -8.11166000e+06])

Without doing mathematical analysis (which by the way, you absolutely should be doing rather than guessing ), your gradient function is off by a factor of -0.5 . 如果不进行数学分析(顺便说一句,您绝对应该做而不是猜测 ),则梯度函数的-0.5-0.5 That negative is pretty critical. 这种负面影响非常关键。

Eric's comment regarding the sign of the gradient function was crucial. 埃里克(Eric)关于梯度函数符号的评论至关重要。 Here is the currectly working code, where np.dot(X, theta1) - Y is now correct and a factor of 0.5 was added to cost_Function 这是当前可以正常工作的代码,其中np.dot(X,theta1)-Y现在正确,并且对cost_Function添加了系数0.5

from scipy import optimize
import numpy as np

data=np.genfromtxt('HouseData.csv',delimiter=',')
X=np.c_[np.ones(len(data)),data[:,:-1]]
Y=data[:,[-1]]

def cost_Function(theta):
    theta1=theta[np.newaxis].T
    cost = Y-np.dot(X, theta1)
    return 0.5*(cost*cost).sum()

# Gradient Function
def gradf(theta):
    theta1 = theta[np.newaxis].T
    cost = np.dot(X, theta1) - Y
    return np.sum(cost*X,axis=0)

x0 = np.asarray((0.1,2)) #initial guess

result = optimize.fmin_cg(cost_Function,x0,fprime=gradf)
print(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM