简体   繁体   English

梯度下降和成本函数 J(theta) 有什么区别?

[英]What is the difference between gradient descent and cost function J(theta)?

I am learning machine learning from coursera.我正在从coursera学习机器学习。 But I'm little confused between gradient descent and cost function.但是我对梯度下降和成本函数没有什么困惑。 When and where I should use those?我应该何时何地使用这些?

J(ϴ) is minimized by trial and error approach ie trying lot of values and then checking the output. J(ϴ) 通过试错法最小化,即尝试大量值然后检查输出。 So in practice this means that this work is done by hand and is time consuming.所以在实践中,这意味着这项工作是手工完成的,而且很耗时。

Gradient Descent basically just does what J(ϴ) does but in a automated way — change the theta values, or parameters, bit by bit, until we hopefully arrived a minimum. Gradient Descent 基本上只是做 J(ϴ) 做的事情,但是以一种自动化的方式——一点一点地改变 theta 值或参数,直到我们希望达到最小值。 This is an iterative method where the model moves to the direction of steepest descent ie the optimal value of theta.这是一种迭代方法,其中模型移动到最速下降的方向,即 theta 的最佳值。

Why use Gradient descent?为什么要使用梯度下降? it is easy to implement and is generic optimization technique so will work even if you change your model.它易于实现并且是通用优化技术,因此即使您更改模型也能正常工作。 It is also better to use GD if you have a lot of features because in this case, normal J(ϴ) computation becomes very expensive.如果您有很多特征,最好使用 GD,因为在这种情况下,正常的 J(ϴ) 计算变得非常昂贵。

Gradient Descent requires a cost function(there are many types of cost functions).梯度下降需要一个代价函数(代价函数有很多种)。 One common function that is often used is mean squared error, which measure the difference between the estimator (the dataset) and the estimated value (the prediction).一个经常使用的常见函数是均方误差,它衡量估计量(数据集)和估计值(预测)之间的差异。

We need this cost function because we want to minimize it.我们需要这个成本函数,因为我们想最小化它。 Minimizing any function means finding the deepest valley in that function.最小化任何函数意味着找到该函数中最深的谷。 Keep in mind that, the cost function is used to monitor the error in predictions of an ML model.请记住,成本函数用于监控 ML 模型预测中的错误。 So minimizing this, basically means getting to the lowest error value possible or increasing the accuracy of the model.因此,将其最小化,基本上意味着获得尽可能低的误差值或提高模型的准确性。 In short, We increase the accuracy by iterating over a training data set while tweaking the parameters(the weights and biases) of our model.简而言之,我们通过在调整模型参数(权重和偏差)的同时迭代训练数据集来提高准确性。

In short, the whole point of Gradient descent is to minimize the cost function简而言之,梯度下降的全部意义在于最小化成本函数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 梯度下降和梯度上升有什么区别? - What is the difference between gradient descent and gradient ascent? 梯度下降和牛顿梯度下降有什么区别? - What is the difference between Gradient Descent and Newton's Gradient Descent? 有人可以向我解释逻辑回归中成本函数和梯度下降方程之间的区别吗? - Can someone explain to me the difference between a cost function and the gradient descent equation in logistic regression? 梯度下降的θ值是什么意思? - What does theta values of gradient descent mean? 机器学习中的梯度下降和网格搜索之间有什么区别? - What is difference between Gradient Descent and Grid Search in Machine Learning? 随机梯度下降增加了成本函数 - Stochastic Gradient Descent increases Cost Function 批量和stochasitc梯度下降之间的算法复杂度差异是什么 - what is the algorithm complexity difference between batch and stochasitc gradient descent 多变量梯度下降Matlab-两种代码有什么区别? - Multivariable gradient descent Matlab - what is the difference between the two codes? 梯度下降不会更新theta值 - Gradient descent not updating theta values 使用矢量化的梯度下降的八度代码未正确更新成本函数 - Octave code for gradient descent using vectorization not updating cost function correctly
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM