梯度下降的θ值是什么意思？

Question

I have all the components, I just am not quite sure This is my output: 我拥有所有组件，我只是不太确定这是我的输出：

Theta-->: 0.09604203456288299, 1.1864676227195392

How do I interpret that? 我该怎么解释？ What does it mean? 这是什么意思？

I essentially just modified the example from this description . 我基本上只是修改了此描述中的示例。 But I'm not sure if it's really applicable to my problem. 但是我不确定这是否真的适用于我的问题。 I'm trying to perform binary classification on a set of documents. 我正在尝试对一组文档进行二进制分类。 The documents are rendered as bag-of-words style feature vectors of the form: 文档呈现为以下形式的词袋样式特征向量：

Example: 例：

Document 1 = ["I", "am", "awesome"]
Document 2 = ["I", "am", "great", "great"]

Dictionary is: 字典是：

["I", "am", "awesome", "great"]

So the documents as a vector would look like: 因此，文档作为矢量将如下所示：

Document 1 = [1, 1, 1, 0]
Document 2 = [1, 1, 0, 2]

This is my gradient descent code: 这是我的梯度下降代码：

public static double [] gradientDescent(final double [] theta_in, final double alpha, final int num_iters, double[][] data ) 
{
    final double m = data.length;   
    double [] theta = theta_in;
    double theta0 = 0;
    double theta1 = 0;
    for (int i = 0; i < num_iters; i++) 
    {                        
        final double sum0 = gradientDescentSumScalar0(theta, alpha, data );
        final double sum1 = gradientDescentSumScalar1(theta, alpha, data);                                   
        theta0 = theta[0] - ( (alpha / m) * sum0 ); 
        theta1 = theta[1] - ( (alpha / m) * sum1 );                        
        theta = new double [] { theta0, theta1 };
    }
    return theta;
}


//data is the feature vector
//this theta is weight
protected static double [] matrixMultipleHthetaByX( final double [] theta, double[][] data ) 
{
    final double [] vector = new double[ data.length ];
    int i = 0;                 
    for (final double [] d : data) 
    {
        vector[i] = (1.0 * theta[0]) + (d[0] * theta[1]);            
        i++;
    } // End of the for // 
    return vector;
}


protected static double gradientDescentSumScalar0(final double [] theta, final double alpha, double[][] data ) 
{        
    double sum = 0;
    int i = 0;
    final double [] hthetaByXArr = matrixMultipleHthetaByX(theta, data ); 
    for (final double [] d : data) 
    {
        final double X = 1.0;
        final double y = d[1];
        final double hthetaByX = hthetaByXArr[i];    
        sum = sum + ( (hthetaByX - y) * X );
        i++;
    } // End of the for //
    return sum;
}
protected static double gradientDescentSumScalar1(final double [] theta, final double alpha, double[][] data ) 
{        
    double sum = 0;
    int i = 0;
    final double [] hthetaByXArr = matrixMultipleHthetaByX(theta, data );
    for (final double [] d : data) 
    {
        final double X = d[0];
        final double y = d[1];            
        final double hthetaByX = hthetaByXArr[i];         
        sum = sum + ( (hthetaByX - y) * X );
        i++;
    } // End of the for //
    return sum;
}

public static double [] batchGradientDescent( double [] weights, double[][] data ) 
{
    /*
     * From tex:
     * \theta_j := \theta_j - \alpha\frac{1}{m} \sum_{i=1}^m ( h_\theta (x^{(i)})
     */
    final double [] theta_in = weights;
    double [] theta = gradientDescent(theta_in, alpha, iterations, data );
    lastTheta = theta;
    System.out.println("Theta-->: " + theta[0] + ", " + theta[1]);
    return theta;
}

I call it like this: 我这样称呼它：

   final int globoDictSize = globoDict.size(); // number of features

   double[] weights = new double[globoDictSize + 1];
   for (int i = 0; i < weights.length; i++) 
   {
       //weights[i] = Math.floor(Math.random() * 10000) / 10000;
       //weights[i] = randomNumber(0,1);
       weights[i] = 0.0;
   }


   int inputSize = trainingPerceptronInput.size();
   double[] outputs = new double[inputSize];
   final double[][] a = Prcptrn_InitOutpt.initializeOutput(trainingPerceptronInput, globoDictSize, outputs, LABEL);



       for (int p = 0; p < inputSize; p++) 
       {

           Gradient_Descent.batchGradientDescent( weights, a );
       }

How can I verify that this code is doing what I want? 如何验证此代码是否在执行我想要的操作？ Shouldn't it be outputting a predicted label or something? 它不应该输出预测的标签吗？ I've heard I can also apply to it an error function, such as hinge loss , that would come after the call to batch gradient descent as a seperate component, isn't it? 我听说我也可以将其应用于误差函数，例如铰链损失 ，该函数将在调用批次梯度下降作为单独的组件之后出现，不是吗？

Answer 1

You code is complicated (I used to implement batch gradient descent in Octave, not in OO programming languages). 您的代码很复杂（我曾经用Octave实现批量梯度下降，而不是用OO编程语言实现）。 But as far as I see in your code (and it is a common to use this notation) Theta is a parameter vector. 但是据我在您的代码中看到的（使用此表示法很常见），Theta是一个参数向量。 After grad descend algorithm converges it returns you optimal Theta vector. 梯度下降算法收敛后，它会返回最佳的Theta向量。 After that you could claculate output of your new example with formula: 之后，您可以使用公式来说明新示例的输出：

theta_transposed * X, theta_transposed * X，

where theta_trasponsed is a transposed vector of theta, X is a vector of input features. 其中theta_trasponsed是theta的转置向量，X是输入要素的向量。

On a side note, the example you have referred to is a regression task (it is about linear regression). 附带说明一下，您提到的示例是一个回归任务（它是关于线性回归的）。 While the task you describe is a classification problem, where instead of predicting some value (some number - weight, length, smth else) you need to assign a label to input set. 虽然您描述的任务是一个分类问题，但您无需为输入值（某个数字-重量，长度，其他东西）预测某个值，而需要为其分配一个标签。 It can be completed with lots of different algorithms, but defenetily not with linear regression which is described in article you posted. 可以使用许多不同的算法来完成此操作，但是显然不能通过线性回归来完成，这在您发布的文章中有所描述。

I also need to mentioned that it is absolutely not clear what kind of classification you try to perform. 我还需要提到，您完全不清楚要尝试执行哪种分类。 In your exmaple you have a bag of words description (matrixes of word counts). 在您的实例中，您有一袋单词描述（单词计数矩阵）。 But where are classificaiton labels? 但是分类标签在哪里？ Is it multi-output classification? 是多输出分类吗？ Or just multi-class? 还是只是多班？ Or binary? 还是二进制？

I really suggest you to take a course on ml. 我真的建议您参加毫升课程。 Maybe on coursera. 也许在Coursera上。 This one is good: https://www.coursera.org/course/ml It also covers full implementaion of gradient descent. 这个很好： https : //www.coursera.org/course/ml它也涵盖了梯度下降的全部实现。

梯度下降的θ值是什么意思？

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-02-25 21:32:42

梯度下降的θ值是什么意思？

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-02-25 21:32:42

解决方案1
0 已采纳 2015-02-25 21:32:42