简体繁体 English

梯度下降和成本函数 J(theta) 有什么区别？

[英]What is the difference between gradient descent and cost function J(theta)?

原文 2019-05-03 10:04:38 1 2 machine-learning

I am learning machine learning from coursera.我正在从coursera学习机器学习。 But I'm little confused between gradient descent and cost function.但是我对梯度下降和成本函数没有什么困惑。 When and where I should use those?我应该何时何地使用这些？

2 个解决方案

J(ϴ) is minimized by trial and error approach ie trying lot of values and then checking the output. J(ϴ) 通过试错法最小化，即尝试大量值然后检查输出。 So in practice this means that this work is done by hand and is time consuming.所以在实践中，这意味着这项工作是手工完成的，而且很耗时。

Gradient Descent basically just does what J(ϴ) does but in a automated way — change the theta values, or parameters, bit by bit, until we hopefully arrived a minimum. Gradient Descent 基本上只是做 J(ϴ) 做的事情，但是以一种自动化的方式——一点一点地改变 theta 值或参数，直到我们希望达到最小值。 This is an iterative method where the model moves to the direction of steepest descent ie the optimal value of theta.这是一种迭代方法，其中模型移动到最速下降的方向，即 theta 的最佳值。

Why use Gradient descent?为什么要使用梯度下降？ it is easy to implement and is generic optimization technique so will work even if you change your model.它易于实现并且是通用优化技术，因此即使您更改模型也能正常工作。 It is also better to use GD if you have a lot of features because in this case, normal J(ϴ) computation becomes very expensive.如果您有很多特征，最好使用 GD，因为在这种情况下，正常的 J(ϴ) 计算变得非常昂贵。

Gradient Descent requires a cost function(there are many types of cost functions).梯度下降需要一个代价函数（代价函数有很多种）。 One common function that is often used is mean squared error, which measure the difference between the estimator (the dataset) and the estimated value (the prediction).一个经常使用的常见函数是均方误差，它衡量估计量（数据集）和估计值（预测）之间的差异。

We need this cost function because we want to minimize it.我们需要这个成本函数，因为我们想最小化它。 Minimizing any function means finding the deepest valley in that function.最小化任何函数意味着找到该函数中最深的谷。 Keep in mind that, the cost function is used to monitor the error in predictions of an ML model.请记住，成本函数用于监控 ML 模型预测中的错误。 So minimizing this, basically means getting to the lowest error value possible or increasing the accuracy of the model.因此，将其最小化，基本上意味着获得尽可能低的误差值或提高模型的准确性。 In short, We increase the accuracy by iterating over a training data set while tweaking the parameters(the weights and biases) of our model.简而言之，我们通过在调整模型参数（权重和偏差）的同时迭代训练数据集来提高准确性。

In short, the whole point of Gradient descent is to minimize the cost function简而言之，梯度下降的全部意义在于最小化成本函数