fmin_cg在scipy中的梯度函数

Question

I am trying to use the conjugate gradient algorithm (fmin_cg) from scipy to find parameters theta which give the best fit within a linear model. 我正在尝试使用来自scipy的共轭梯度算法（fmin_cg）来找到在线性模型中能提供最佳拟合的参数theta。

Data file HouseData.csv (eg house area, house price): 数据文件HouseData.csv（例如房屋面积，房屋价格）：

The code is: 代码是：

from scipy import optimize
import numpy as np

data=np.genfromtxt('HouseData.csv',delimiter=',')
X=np.c_[np.ones(len(data)),data[:,:-1]]
Y=data[:,[-1]]

def cost_Function(theta):
    theta1=theta[np.newaxis].T
    #print('theta: ',theta1)
    cost = Y-np.dot(X, theta1)
    return (cost*cost).sum()

# Gradient Function
def gradf(theta):
    theta1 = theta[np.newaxis].T
    cost = Y - np.dot(X, theta1)
    #print('cost*X.sum(0) is', np.sum(cost*X,axis=0))
    return np.sum(cost*X,axis=0)


x0 = np.asarray((0,1)) #initial guess
result = optimize.fmin_cg(cost_Function,x0,fprime=gradf)
print(result)

Without fprime=gradf the code returns the correct result, but what is the problem with the gradient function? 如果没有fprime = gradf，代码将返回正确的结果，但是渐变函数有什么问题？ When including it as above, the algorithm returns exactly the input for theta. 当如上所述包含它时，算法将精确返回theta的输入。 Is there anything else you would implement differently to improve performance? 您还有其他方法可以提高性能吗？ This is just a simple example but the algorithms should also run with X having many columns and rows. 这只是一个简单的示例，但是算法也应该在具有许多列和行的X上运行。

(python 3.5.1, scipy and numpy most recent version) （python 3.5.1，scipy和numpy最新版本）

Answer 1

Your gradient is clearly wrong. 您的渐变显然是错误的。

Since your cost function is quadratic, we can approximate the gradient reasonably well with: gradf(x) = (f(x + eps) - f(x - eps)) / (2 eps) . 由于您的成本函数是二次函数，因此我们可以用gradf(x) = (f(x + eps) - f(x - eps)) / (2 eps)很好地近似梯度。 Let's try that: 让我们尝试一下：

e0 = np.array([1, 0])
e1 = np.array([0, 1])
eps = 1e-5

x0 = np.array([1, 1])

df_yours = gradf(x0)
# array([  3.54000000e+03,   4.05583000e+06])

df_approx = np.array([
    cost_Function(x0 + eps*e0) - cost_Function(x0 - eps*e0),
    cost_Function(x0 + eps*e1) - cost_Function(x0 - eps*e1)
]) / (2 * eps)
# array([ -7.07999999e+03,  -8.11166000e+06])

Without doing mathematical analysis (which by the way, you absolutely should be doing rather than guessing ), your gradient function is off by a factor of -0.5 . 如果不进行数学分析（顺便说一句，您绝对应该做而不是猜测 ），则梯度函数的-0.5为-0.5 。 That negative is pretty critical. 这种负面影响非常关键。

Answer 2

Eric's comment regarding the sign of the gradient function was crucial. 埃里克（Eric）关于梯度函数符号的评论至关重要。 Here is the currectly working code, where np.dot(X, theta1) - Y is now correct and a factor of 0.5 was added to cost_Function 这是当前可以正常工作的代码，其中np.dot（X，theta1）-Y现在正确，并且对cost_Function添加了系数0.5

from scipy import optimize
import numpy as np

data=np.genfromtxt('HouseData.csv',delimiter=',')
X=np.c_[np.ones(len(data)),data[:,:-1]]
Y=data[:,[-1]]

def cost_Function(theta):
    theta1=theta[np.newaxis].T
    cost = Y-np.dot(X, theta1)
    return 0.5*(cost*cost).sum()

# Gradient Function
def gradf(theta):
    theta1 = theta[np.newaxis].T
    cost = np.dot(X, theta1) - Y
    return np.sum(cost*X,axis=0)

x0 = np.asarray((0.1,2)) #initial guess

result = optimize.fmin_cg(cost_Function,x0,fprime=gradf)
print(result)

fmin_cg在scipy中的梯度函数

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-05-17 01:36:18

解决方案2
1 2016-05-17 08:27:40

fmin_cg在scipy中的梯度函数

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-05-17 01:36:18

解决方案2 1 2016-05-17 08:27:40

解决方案1
3 已采纳 2016-05-17 01:36:18

解决方案2
1 2016-05-17 08:27:40