简体   繁体   English

梯度下降的θ值是什么意思?

[英]What does theta values of gradient descent mean?

I have all the components, I just am not quite sure This is my output: 我拥有所有组件,我只是不太确定这是我的输出:

Theta-->: 0.09604203456288299, 1.1864676227195392

How do I interpret that? 我该怎么解释? What does it mean? 这是什么意思?

I essentially just modified the example from this description . 我基本上只是修改了此描述中的示例。 But I'm not sure if it's really applicable to my problem. 但是我不确定这是否真的适用于我的问题。 I'm trying to perform binary classification on a set of documents. 我正在尝试对一组文档进行二进制分类。 The documents are rendered as bag-of-words style feature vectors of the form: 文档呈现为以下形式的词袋样式特征向量:

Example: 例:

Document 1 = ["I", "am", "awesome"]
Document 2 = ["I", "am", "great", "great"]

Dictionary is: 字典是:

["I", "am", "awesome", "great"]

So the documents as a vector would look like: 因此,文档作为矢量将如下所示:

Document 1 = [1, 1, 1, 0]
Document 2 = [1, 1, 0, 2]

This is my gradient descent code: 这是我的梯度下降代码:

public static double [] gradientDescent(final double [] theta_in, final double alpha, final int num_iters, double[][] data ) 
{
    final double m = data.length;   
    double [] theta = theta_in;
    double theta0 = 0;
    double theta1 = 0;
    for (int i = 0; i < num_iters; i++) 
    {                        
        final double sum0 = gradientDescentSumScalar0(theta, alpha, data );
        final double sum1 = gradientDescentSumScalar1(theta, alpha, data);                                   
        theta0 = theta[0] - ( (alpha / m) * sum0 ); 
        theta1 = theta[1] - ( (alpha / m) * sum1 );                        
        theta = new double [] { theta0, theta1 };
    }
    return theta;
}


//data is the feature vector
//this theta is weight
protected static double [] matrixMultipleHthetaByX( final double [] theta, double[][] data ) 
{
    final double [] vector = new double[ data.length ];
    int i = 0;                 
    for (final double [] d : data) 
    {
        vector[i] = (1.0 * theta[0]) + (d[0] * theta[1]);            
        i++;
    } // End of the for // 
    return vector;
}


protected static double gradientDescentSumScalar0(final double [] theta, final double alpha, double[][] data ) 
{        
    double sum = 0;
    int i = 0;
    final double [] hthetaByXArr = matrixMultipleHthetaByX(theta, data ); 
    for (final double [] d : data) 
    {
        final double X = 1.0;
        final double y = d[1];
        final double hthetaByX = hthetaByXArr[i];    
        sum = sum + ( (hthetaByX - y) * X );
        i++;
    } // End of the for //
    return sum;
}
protected static double gradientDescentSumScalar1(final double [] theta, final double alpha, double[][] data ) 
{        
    double sum = 0;
    int i = 0;
    final double [] hthetaByXArr = matrixMultipleHthetaByX(theta, data );
    for (final double [] d : data) 
    {
        final double X = d[0];
        final double y = d[1];            
        final double hthetaByX = hthetaByXArr[i];         
        sum = sum + ( (hthetaByX - y) * X );
        i++;
    } // End of the for //
    return sum;
}

public static double [] batchGradientDescent( double [] weights, double[][] data ) 
{
    /*
     * From tex:
     * \theta_j := \theta_j - \alpha\frac{1}{m} \sum_{i=1}^m ( h_\theta (x^{(i)})
     */
    final double [] theta_in = weights;
    double [] theta = gradientDescent(theta_in, alpha, iterations, data );
    lastTheta = theta;
    System.out.println("Theta-->: " + theta[0] + ", " + theta[1]);
    return theta;
}

I call it like this: 我这样称呼它:

   final int globoDictSize = globoDict.size(); // number of features

   double[] weights = new double[globoDictSize + 1];
   for (int i = 0; i < weights.length; i++) 
   {
       //weights[i] = Math.floor(Math.random() * 10000) / 10000;
       //weights[i] = randomNumber(0,1);
       weights[i] = 0.0;
   }


   int inputSize = trainingPerceptronInput.size();
   double[] outputs = new double[inputSize];
   final double[][] a = Prcptrn_InitOutpt.initializeOutput(trainingPerceptronInput, globoDictSize, outputs, LABEL);



       for (int p = 0; p < inputSize; p++) 
       {

           Gradient_Descent.batchGradientDescent( weights, a );
       }

How can I verify that this code is doing what I want? 如何验证此代码是否在执行我想要的操作? Shouldn't it be outputting a predicted label or something? 它不应该输出预测的标签吗? I've heard I can also apply to it an error function, such as hinge loss , that would come after the call to batch gradient descent as a seperate component, isn't it? 我听说我也可以将其应用于误差函数,例如铰链损失 ,该函数将在调用批次梯度下降作为单独的组件之后出现,不是吗?

You code is complicated (I used to implement batch gradient descent in Octave, not in OO programming languages). 您的代码很复杂(我曾经用Octave实现批量梯度下降,而不是用OO编程语言实现)。 But as far as I see in your code (and it is a common to use this notation) Theta is a parameter vector. 但是据我在您的代码中看到的(使用此表示法很常见),Theta是一个参数向量。 After grad descend algorithm converges it returns you optimal Theta vector. 梯度下降算法收敛后,它会返回最佳的Theta向量。 After that you could claculate output of your new example with formula: 之后,您可以使用公式来说明新示例的输出:

theta_transposed * X, theta_transposed * X,

where theta_trasponsed is a transposed vector of theta, X is a vector of input features. 其中theta_trasponsed是theta的转置向量,X是输入要素的向量。

On a side note, the example you have referred to is a regression task (it is about linear regression). 附带说明一下,您提到的示例是一个回归任务(它是关于线性回归的)。 While the task you describe is a classification problem, where instead of predicting some value (some number - weight, length, smth else) you need to assign a label to input set. 虽然您描述的任务是一个分类问题,但您无需为输入值(某个数字-重量,长度,其他东西)预测某个值,而需要为其分配一个标签。 It can be completed with lots of different algorithms, but defenetily not with linear regression which is described in article you posted. 可以使用许多不同的算法来完成此操作,但是显然不能通过线性回归来完成,这在您发布的文章中有所描述。

I also need to mentioned that it is absolutely not clear what kind of classification you try to perform. 我还需要提到,您完全不清楚要尝试执行哪种分类。 In your exmaple you have a bag of words description (matrixes of word counts). 在您的实例中,您有一袋单词描述(单词计数矩阵)。 But where are classificaiton labels? 但是分类标签在哪里? Is it multi-output classification? 是多输出分类吗? Or just multi-class? 还是只是多班? Or binary? 还是二进制?

I really suggest you to take a course on ml. 我真的建议您参加毫升课程。 Maybe on coursera. 也许在Coursera上。 This one is good: https://www.coursera.org/course/ml It also covers full implementaion of gradient descent. 这个很好: https : //www.coursera.org/course/ml它也涵盖了梯度下降的全部实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM