Python：随机梯度下降的代码是如何工作的？

Question

n_epochs = 50
t0, t1 = 5, 50 # learning schedule hyperparameters
def learning_schedule(t):
return t0 / (t + t1)
theta = np.random.randn(2,1) # random initialization

for epoch in range(n_epochs):
   for i in range(m):
      random_index = np.random.randint(m)
      xi = X_b[random_index:random_index+1]
      yi = y[random_index:random_index+1]
      gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
      eta = learning_schedule(epoch * m + i)
      theta = theta - eta * gradients

By convention we iterate by rounds of m iterations;按照惯例，我们迭代 m 轮迭代； each round is called an epoch.每一轮称为一个纪元。 While the Batch Gradient Descent code iterated 1,000 times through the whole training set, this code goes through the training set only 50 times and reaches a pretty good solution Batch Gradient Descent 代码在整个训练集上迭代了 1000 次，而这段代码只在训练集上迭代了 50 次，并达到了很好的解决方案

The author says that the code iterates through the training set only 50 times.作者说代码只迭代了训练集 50 次。 How is that possible?这怎么可能？

Here isn't it the case that for every epoch, 1->50, i is going from 1->100 and the training data is being iterated 50*100=5000 times?对于每个 epoch，1->50，i 从 1->100 开始，训练数据被迭代 50*100=5000 次，这不是这种情况吗？

Answer 1

It will iterate all data points 50 times for ex- If data point =100 and epoch = 50 then answer is 50*100=5000它将迭代所有数据点 50 次 ex- 如果数据点 =100 和纪元 = 50 那么答案是 50*100=5000

If its Stochastic gradient descent then total iteration is 5000如果它的随机梯度下降那么总迭代是 5000

If its simple Gradient descent then total itertion is 50如果它的简单梯度下降那么总迭代是 50

if you want to know about the Different gradient descent techniques visit HERE如果您想了解不同的梯度下降技术，请访问这里

Python：随机梯度下降的代码是如何工作的？

问题描述

1 个解决方案

解决方案1
0 2020-08-01 09:36:22

Python：随机梯度下降的代码是如何工作的？

问题描述

1 个解决方案

解决方案1 0 2020-08-01 09:36:22

解决方案1
0 2020-08-01 09:36:22