简体   繁体   English

为什么我在python中使用梯度下降来获得逻辑回归的负成本函数?

[英]Why am I getting a negative cost function for logistic regression using gradient descent in python?

I'm trying to apply what I've learned in Andrew Ng's Coursera course. 我正在尝试应用我在Andrew Ng的Coursera课程中学到的知识。 I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this data (UFC fights) I'm getting a negative cost. 我已经成功地实现了这个相同的算法,就像我在Kaggle Titanic数据集中所做的那样,但现在有了这些数据(UFC战斗)我得到了负成本。 I've stripped the dataset down to only two features (opponent and the round the fight ended in), then took their zscore. 我已经将数据集剥离到只有两个特征(对手和战斗结束的圆形),然后拿走了他们的zscore。

This is my design matrix: (it's actually much bigger but I get the same negative cost when it's this small) 这是我的设计矩阵:(它实际上要大得多,但是当它很小的时候我会得到相同的负成本)

array([[ 1.        , -0.50373455, -0.35651205],
   [ 1.        , -1.54975476,  0.84266484],
   [ 1.        ,  0.63737841, -1.55568894],
   [ 1.        ,  1.11284214,  0.84266484],
   [ 1.        , -1.07429103,  0.84266484],
   [ 1.        , -1.07429103, -1.55568894],
   [ 1.        ,  0.25700742,  0.84266484],
   [ 1.        , -1.83503301, -0.35651205],
   [ 1.        ,  1.20793489, -0.35651205],
   [ 1.        ,  1.58830588, -1.55568894],
   [ 1.        , -1.16938378,  0.84266484],
   [ 1.        , -0.78901279, -0.35651205],
   [ 1.        , -0.50373455, -1.55568894],
   [ 1.        ,  1.0177494 , -0.35651205],
   [ 1.        , -0.21845631,  0.84266484],
   [ 1.        ,  0.92265665, -1.55568894],
   [ 1.        ,  0.06682193,  0.84266484],
   [ 1.        ,  1.30302764, -0.35651205],
   [ 1.        ,  0.44719292, -0.35651205],
   [ 1.        , -0.69392004,  0.84266484],
   [ 1.        ,  1.39812038, -1.55568894],
   [ 1.        , -0.97919828,  0.84266484],
   [ 1.        ,  0.16191468,  0.84266484],
   [ 1.        , -1.54975476,  0.84266484],
   [ 1.        , -0.02827082,  0.84266484],
   [ 1.        ,  0.63737841, -0.35651205],
   [ 1.        , -0.88410554,  0.84266484],
   [ 1.        ,  0.06682193,  0.84266484],
   [ 1.        , -1.73994026,  0.84266484],
   [ 1.        , -0.12336356,  0.84266484],
   [ 1.        , -0.97919828,  0.84266484],
   [ 1.        ,  0.8275639 , -1.55568894],
   [ 1.        ,  0.73247116,  0.84266484],
   [ 1.        ,  1.68339863, -1.55568894],
   [ 1.        ,  0.35210017, -1.55568894],
   [ 1.        , -0.02827082,  0.84266484],
   [ 1.        ,  1.30302764,  0.84266484]])

My weights vector is initialized to all zeros: 我的权重向量初始化为全零:

array([[0.],
   [0.],
   [0.]])

For completeness, here's the Y vector: 为了完整性,这里是Y向量:

array([[0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1]], dtype=uint8)

This is my cost function and sigmoid / predict functions: 这是我的成本函数和sigmoid / predict函数:

def cost_function(X, Y, theta):
    m = len(Y)
    h = predict(X,theta)
    cost = (np.dot((-Y.T), np.log(h)) - np.dot((1-Y).T, np.log(1-h))) / m
    return cost

def sigmoid(z):
    return 1/(1+np.e**(-z))

def predict(X, theta):
    z = np.dot(X, theta)
    return sigmoid(z)

Here's the gradient descent function: 这是梯度下降函数:

def gradient_descent(X, Y, theta, rate):
    m = len(Y)
    h = predict(X, theta)

    gradient = rate * np.dot(X.T, (h-Y)) / m
    theta -= gradient
    return theta

Then I use this train function to call both over n iterations. 然后我使用这个train函数来调用n次迭代。

def train(X, Y, theta, rate, iters):
    cost_history = []

    for i in range(iters):
        theta = gradient_descent(X, Y, theta, rate)

        cost = cost_function(X, Y, theta)
        cost_history.append(cost)

        if i % 100 == 0:
            print("iter: " + str(i) + " cost: " + str(cost))
    return theta, cost_history

Then at the end of this I end up with a cost function that looks like this: 然后在最后我得到一个看起来像这样的成本函数: 在此输入图像描述

That's what I'm having trouble understanding. 这就是我无法理解的。 Why is it that it's negative? 为什么它是否定的? Is it a problem with the code, or the data, or is this how it's supposed to work and I'm missing something? 它是代码或数据的问题,还是它应该如何工作而我错过了什么? I've been trying for the last day to figure it out but haven't gotten anywhere. 我一直在努力寻找最后一天,但却没有得到任何结果。 With just these features it still correctly predicts the outcome of the fight about 54% of the time in the test set using the weights after it's been trained using the above functions, but the cost is negative. 仅仅通过这些功能,在使用上述功能训练后使用权重时,它仍能正确地预测测试集中54%的时间的战斗结果,但成本是负的。

Okay after some more troubleshooting I found the problem. 好的,经过一些更多的故障排除我发现了问题。 I'm not sure why it's causing the problem, but fixing it puts my cost function back to normal. 我不确定为什么会导致问题,但修复它会使我的成本函数恢复正常。

So the Y vector's dtype is uint8 , and that apparently causes problems somewhere down the line. 所以Y向量的dtypeuint8 ,这显然会导致某些问题。 Changing it to int64 fixed everything. 将其更改为int64修复了所有问题。 Sorry I don't know why it's causing the problem, but if I find out I'll edit it into my answer. 对不起,我不知道为什么会导致这个问题,但是如果我发现我会把它编辑成我的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用梯度下降的线性回归; 成本函数值有问题 - Linear regression using gradient descent; having trouble with cost function value 用于逻辑回归的Python正则化梯度下降 - Python regularized gradient descent for logistic regression 为什么我在随机梯度下降实现中获得了巨大的成本? - Why I'm getting a huge cost in Stochastic Gradient Descent Implementation? 找到 python 中逻辑回归的负对数似然成本和关于 w,bF 的梯度损失 - Find negative log-likelihood cost for logistic regression in python and gradient loss with respect to w,bF Python梯度下降多重回归-成本增加到无穷大 - Python gradient-descent multi-regression - cost increases to infinity 在 Python 中使用随机梯度下降的岭回归 - Ridge regression using stochastic gradient descent in Python 为什么我的逻辑回归模型准确率达到 100%? - Why am I getting 100% accuracy for my logistic regression model? 为什么在将 RandomSearchCV 或 GridSearchCV 与逻辑回归一起使用时出现错误? - why am i getting an error while using either RandomSearchCV or GridSearchCV with logistic regression? 梯度下降的Python多项式回归 - Python Polynomial Regression with Gradient Descent Logistic回归:成本函数并未下降 - Logistic regression: the cost function is not decreasing
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM