简体   繁体   English

不同的Python最小化函数给出不同的值,为什么?

[英]Different Python minimization functions give different values, Why?

I'm trying to learn python by rewriting Andrew Ng's Machine learning course assignments from Octave (I took the classed and got the certificate). 我正在尝试通过重写Octave的Andrew Ng的机器学习课程作业来学习python(我上了课程并获得了证书)。 I'm having issues with the optimization functions. 我在优化功能方面遇到问题。 In the course they use fmincg which is a function used in Octave to minimize a the cost function (convex functions) of linear regression providing its derivative. 在使用过程中,他们使用fmincg(这是Octave中使用的函数)来最小化提供其导数的线性回归的成本函数(凸函数)。 They also teach you how to use gradient descent and the normal equation, which in theory they all give you the same result (within a few decimal places) if they've been used correctly. 他们还教您如何使用梯度下降法和正态方程,理论上,如果正确使用它们,它们都会为您提供相同的结果(在小数点后几位)。 They all work great for linear regression and I do get the same results in python. 它们都非常适合线性回归,而我在python中也得到了相同的结果。 To be clear I'm trying to minimize the cost function to find the best fitting parameters (theta) of the data set. 为了清楚起见,我正在尝试最小化成本函数,以找到数据集的最佳拟合参数(θ)。 So far I've used 'nelder-mead' which doesn't need the derivative and it gave me the closest looking solution to what they have. 到目前为止,我已经使用了“纳德米德”方法,该方法不需要导数,它为我提供了最接近他们所拥有的解决方案。 I've also tried 'TNC', 'CG' and 'BFGS', which all require a derivative to minimize the function. 我还尝试过“ TNC”,“ CG”和“ BFGS”,它们都需要使用导数来最小化功能。 They all work great when I have first order polynomial (linear) but when I increase the order of the polynomial to something non-linear and in my case I have x^1 up to x^8, then I can't get my function to fit the data set. 当我拥有一阶多项式(线性)时,它们都很好用,但是当我将多项式的阶数增加到非线性的东西时,在我的情况下,我的x ^ 1到x ^ 8,那么我就无法获得函数以适合数据集。 The exercise I'm doing is really simple, I have 12 data points so putting an 8th order polynomial should capture every single point (if you're curious it's an example of high variance ie overfitting the data). 我正在做的练习非常简单,我有12个数据点,因此放置一个8阶多项式应该捕获每个点(如果您很好奇,这是一个高方差示例,即过度拟合数据)。 The solution they show, is a line that goes through all the data points as expected and captures everything. 他们显示的解决方案是一条按预期方式遍历所有数据点并捕获所有内容的线。 The best I got was when I used 'nelder-mead' method and it only captured two point out of the data sets, while the rest of the minimization functions didn't even give me anything close to what I'm looking for. 我得到的最好的结果是,当我使用“纳德-米德”方法时,它仅捕获了数据集中的两个点,而其余的最小化功能甚至都没有给我任何我想要的东西。 I'm not sure what's wrong because my cost function and gradients are giving the right values for the linear case so I'm assuming they're working fine (the exact answer of Octave). 我不确定出什么问题了,因为我的成本函数和渐变为线性情况给出了正确的值,因此我假设它们工作正常(Octave的确切答案)。

I'm going to list the the functions both in Octave and python in hope someone can explain to me why I'm getting the different answers. 我将列出Octave和python中的函数,希望有人可以向我解释为什么我得到不同的答案。 Or point out the obvious error that I'm not seeing. 或指出我没有看到的明显错误。

function [J, grad] = linearRegCostFunction(X, y, theta, lambda)
%LINEARREGCOSTFUNCTION Compute cost and gradient for regularized linear 
%regression with multiple variables
%   [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lambda) computes the 
%   cost of using theta as the parameter for linear regression to fit the 
%   data points in X and y. Returns the cost in J and the gradient in grad


m = length(y); % number of training examples 
J = 0;
grad = zeros(size(theta));

htheta = X * theta;
n = size(theta);
J = 1 / (2 * m) * sum((htheta - y) .^ 2) + lambda / (2 * m) * sum(theta(2:n) .^ 2);

grad = 1 / m * X' * (htheta - y);
grad(2:n) = grad(2:n) + lambda / m * theta(2:n); # we leave the bias nice 
grad = grad(:);

end

Here is a snippets of my code and if anyone likes the full code, I can provide that as well: 这是我的代码片段,如果有人喜欢完整的代码,我也可以提供:

def costFunction(theta, Xcost, y, lmda):
    m = len(y)
    theta = theta.reshape((len(theta),1))
    htheta = np.dot(Xcost,theta) - y 
    J = 1 / (2 * m) * np.dot(htheta.T,htheta) + lmda / (2 * m) * np.sum(theta[1:,:]**2)
    return J

def gradCostFunc(gradtheta, X, y, lmda):
    m = len(y)
    gradtheta = gradtheta.reshape((len(gradtheta),1))
    hgradtheta = np.dot(X,gradtheta) - y 
    #gradtheta[0,0] = 0. 

    grad = (1 / m) * np.dot(X.T, hgradtheta)

    #for i in range(1,len(grad)):
    grad[1:,0] = grad[1:,0] + (lmda/m) * gradtheta[1:,0]
    return grad.reshape((len(grad)))

def normalEqn(X, y, lmda):
    e = np.eye(X.shape[1])
    e[0,0] = 0
    theta = np.dot(np.linalg.pinv(np.dot(X.T,X) + lmda * e),np.dot(X.T,y))
    return theta 

def gradientDescent(X, y, theta, alpha, lmda, num_iters):
    # calculate gradient descent in an iterative manner
    m = len(y)
    # J_history tracks the evolution of the cost function 
    J_history = np.zeros((num_iters,1))

    # Calculating the gradients 
    for i in range(0, num_iters):
        grad = np.zeros((len(theta),1))
        grad = gradCostFunc(theta, X, y, lmda)
        #updating the thetas 
        theta = theta - alpha * grad 
        J_history[i] = costFunction(theta, X, y, lmda)

    plt.plot(J_history)
    plt.show()

    return theta 

def trainLR(initheta, X, y, lmda):
    #print theta.shape, X.shape, y.shape, gradtest.shape gradCostFunc
    options = {'maxiter': 1000}
    res = optimize.minimize(costFunction, initheta, jac=gradCostFunc, method='CG',                            args=(X, y, lmda), options = options)
    #res = optimize.minimize(costFunction, theta, method='nelder-mead',                             args=(X,y,lmda), options={'disp': False})
    #res = optimize.fmin_bfgs(costFunction, theta, fprime=gradCostFunc, args=(X, y, lmda))
    return res.x

def polyFeatures(X, degree):
    # map the higher polynomials 
    out = X 
    if degree >= 2:
        for i in range(2,degree+1):
            out = np.column_stack((out,X**i))
        return out 
    else:
        return out

def featureNormalize(X):
    # Since the values will vary by orders of magnitudes 
    # It’s important to normalize the various features 
    mu = np.mean(X, axis=0)
    S1 = np.std(X, axis=0)
    return mu, S1, (X - mu)/S1

And here is the main call for these function: 这是这些函数的主要调用:

X, y, Xval, yval, Xtest, ytest = loadData('ex5data1.mat')
X_poly = X # to be used in the later on in the program 
p = 8 
X_poly = polyFeatures(X_poly, p)
mu, sigma, X_poly = featureNormalize(X_poly)
X_poly = padding(X_poly)
theta = np.zeros((X_poly.shape[1],1))
theta = trainLR(theta, X_poly, y, 0.)
#theta = normalEqn(X_poly, y, 0.)
#theta = gradientDescent(X_poly, y, theta, 0.1, 0, 1500)

My answer is probably off point, because your question was for help debugging your current implementation. 我的回答可能不合时宜,因为您的问题是帮助调试当前的实现。

That said, if you're interested in using ready-made optimisers in Python then have a look at OpenOpt . 就是说,如果您有兴趣在Python中使用现成的优化器,请查看OpenOpt The library contains reasonably performant implementations of optimisers for a wide variety of optimisation problems. 该库包含针对各种优化问题的优化器的合理执行。

I should also mention that the scikit-learn library provides a nice Machine Learning toolset for Python. 我还应该提到scikit-learn库为Python提供了一个不错的机器学习工具集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python-如何为不同的函数赋值 - Python - How to give values to different functions 不明白为什么这两个Python函数给出不同的结果 - Don't understand why these two Python functions give different results 为什么在不同的操作系统和版本中打印这些值会给出不同的值? - Why does printing these values give different values in different OS and versions? 为什么statsmodels的相关性和自相关函数在Python中会产生不同的结果? - Why do statsmodels's correlation and autocorrelation functions give different results in Python? 为什么 Python 的三角函数给出的结果与我的计算器不同? - Why do Python's trigonometry functions give different results than my calculator? sklearn r2_score和python stats lineregress函数给出非常不同的R ^ 2值。 为什么? - sklearn r2_score and python stats lineregress function give very different values of R^2. Why? Python中2个变量(不同维度)的非线性最小二乘最小化 - Non-linear least square minimization of 2 variables (different dimension) in python 基于列表中的值调用Python中的不同函数 - Calling Different Functions in Python Based on Values in a List 在 Python 中使用不同的参数值制作一系列函数? - Making a sequence of functions in Python with different parameter values? 为什么Python中协方差函数的输出差别太大? - Why the output of covariance functions in Python are too different?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM