为什么我在python中使用梯度下降来获得逻辑回归的负成本函数？

Question

I'm trying to apply what I've learned in Andrew Ng's Coursera course. 我正在尝试应用我在Andrew Ng的Coursera课程中学到的知识。 I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this data (UFC fights) I'm getting a negative cost. 我已经成功地实现了这个相同的算法，就像我在Kaggle Titanic数据集中所做的那样，但现在有了这些数据（UFC战斗）我得到了负成本。 I've stripped the dataset down to only two features (opponent and the round the fight ended in), then took their zscore. 我已经将数据集剥离到只有两个特征（对手和战斗结束的圆形），然后拿走了他们的zscore。

This is my design matrix: (it's actually much bigger but I get the same negative cost when it's this small) 这是我的设计矩阵:(它实际上要大得多，但是当它很小的时候我会得到相同的负成本）

array([[ 1.        , -0.50373455, -0.35651205],
   [ 1.        , -1.54975476,  0.84266484],
   [ 1.        ,  0.63737841, -1.55568894],
   [ 1.        ,  1.11284214,  0.84266484],
   [ 1.        , -1.07429103,  0.84266484],
   [ 1.        , -1.07429103, -1.55568894],
   [ 1.        ,  0.25700742,  0.84266484],
   [ 1.        , -1.83503301, -0.35651205],
   [ 1.        ,  1.20793489, -0.35651205],
   [ 1.        ,  1.58830588, -1.55568894],
   [ 1.        , -1.16938378,  0.84266484],
   [ 1.        , -0.78901279, -0.35651205],
   [ 1.        , -0.50373455, -1.55568894],
   [ 1.        ,  1.0177494 , -0.35651205],
   [ 1.        , -0.21845631,  0.84266484],
   [ 1.        ,  0.92265665, -1.55568894],
   [ 1.        ,  0.06682193,  0.84266484],
   [ 1.        ,  1.30302764, -0.35651205],
   [ 1.        ,  0.44719292, -0.35651205],
   [ 1.        , -0.69392004,  0.84266484],
   [ 1.        ,  1.39812038, -1.55568894],
   [ 1.        , -0.97919828,  0.84266484],
   [ 1.        ,  0.16191468,  0.84266484],
   [ 1.        , -1.54975476,  0.84266484],
   [ 1.        , -0.02827082,  0.84266484],
   [ 1.        ,  0.63737841, -0.35651205],
   [ 1.        , -0.88410554,  0.84266484],
   [ 1.        ,  0.06682193,  0.84266484],
   [ 1.        , -1.73994026,  0.84266484],
   [ 1.        , -0.12336356,  0.84266484],
   [ 1.        , -0.97919828,  0.84266484],
   [ 1.        ,  0.8275639 , -1.55568894],
   [ 1.        ,  0.73247116,  0.84266484],
   [ 1.        ,  1.68339863, -1.55568894],
   [ 1.        ,  0.35210017, -1.55568894],
   [ 1.        , -0.02827082,  0.84266484],
   [ 1.        ,  1.30302764,  0.84266484]])

My weights vector is initialized to all zeros: 我的权重向量初始化为全零：

array([[0.],
   [0.],
   [0.]])

For completeness, here's the Y vector: 为了完整性，这里是Y向量：

array([[0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1]], dtype=uint8)

This is my cost function and sigmoid / predict functions: 这是我的成本函数和sigmoid / predict函数：

def cost_function(X, Y, theta):
    m = len(Y)
    h = predict(X,theta)
    cost = (np.dot((-Y.T), np.log(h)) - np.dot((1-Y).T, np.log(1-h))) / m
    return cost

def sigmoid(z):
    return 1/(1+np.e**(-z))

def predict(X, theta):
    z = np.dot(X, theta)
    return sigmoid(z)

Here's the gradient descent function: 这是梯度下降函数：

def gradient_descent(X, Y, theta, rate):
    m = len(Y)
    h = predict(X, theta)

    gradient = rate * np.dot(X.T, (h-Y)) / m
    theta -= gradient
    return theta

Then I use this train function to call both over n iterations. 然后我使用这个train函数来调用n次迭代。

def train(X, Y, theta, rate, iters):
    cost_history = []

    for i in range(iters):
        theta = gradient_descent(X, Y, theta, rate)

        cost = cost_function(X, Y, theta)
        cost_history.append(cost)

        if i % 100 == 0:
            print("iter: " + str(i) + " cost: " + str(cost))
    return theta, cost_history

Then at the end of this I end up with a cost function that looks like this: 然后在最后我得到一个看起来像这样的成本函数：

That's what I'm having trouble understanding. 这就是我无法理解的。 Why is it that it's negative? 为什么它是否定的？ Is it a problem with the code, or the data, or is this how it's supposed to work and I'm missing something? 它是代码或数据的问题，还是它应该如何工作而我错过了什么？ I've been trying for the last day to figure it out but haven't gotten anywhere. 我一直在努力寻找最后一天，但却没有得到任何结果。 With just these features it still correctly predicts the outcome of the fight about 54% of the time in the test set using the weights after it's been trained using the above functions, but the cost is negative. 仅仅通过这些功能，在使用上述功能训练后使用权重时，它仍能正确地预测测试集中54％的时间的战斗结果，但成本是负的。

Answer 1

Okay after some more troubleshooting I found the problem. 好的，经过一些更多的故障排除我发现了问题。 I'm not sure why it's causing the problem, but fixing it puts my cost function back to normal. 我不确定为什么会导致问题，但修复它会使我的成本函数恢复正常。

So the Y vector's dtype is uint8 , and that apparently causes problems somewhere down the line. 所以Y向量的dtype是uint8 ，这显然会导致某些问题。 Changing it to int64 fixed everything. 将其更改为int64修复了所有问题。 Sorry I don't know why it's causing the problem, but if I find out I'll edit it into my answer. 对不起，我不知道为什么会导致这个问题，但是如果我发现我会把它编辑成我的答案。

为什么我在python中使用梯度下降来获得逻辑回归的负成本函数？

问题描述

1 个解决方案

解决方案1
0 2019-04-16 01:02:06

为什么我在python中使用梯度下降来获得逻辑回归的负成本函数？

问题描述

1 个解决方案

解决方案1 0 2019-04-16 01:02:06

解决方案1
0 2019-04-16 01:02:06