简体   繁体   English

使用 numpy 的小批量梯度下降

[英]Mini-batch gradient descent using numpy

I'm currently working through chapter four of Hands on Machine Learning with Sci-kit Learn, Keras and Tensorflow and am stuck on trying to implement a mini batch optimization using Numpy.我目前正在Hands on Machine Learning with Sci-kit Learn, Keras and Tensorflow ,并且一直在尝试使用 Numpy 实施小批量优化。

The cost function is an MSE ( as the example provided is using gradient descent to optimize a linear regression. )成本 function 是 MSE(因为提供的示例使用梯度下降来优化线性回归。)

The code follows below:代码如下:

import numpy as np
X = 2 * np.random.rand(100,1) # Simulate Linear Data
y = 4 + 3 * X + np.random.randn(100,1)
X_b = np.c_[np.ones((100,1)),X]
## Mini batch gradient descent

n_epochs = 50
MINIBATCH_SIZE = 10
rng = np.random.default_rng()
t0,t1 = 5, 50 # learning schedule hyperparameters
m = 100

def learning_schedule(t):
    return t0/(t+t1)

theta = np.random.randn(2,1) # random intitalise weights

mbgd = np.array(
    [[],[],]
)


# 50 epochs, and in each epoch, we are training 100 times, and this can be with the same/different x point ( its random!)
# We carry across the theta/weights across epoch's

for epoch in range(n_epochs): 
    for i in range(m): 
        random_indexes = rng.choice(m,MINIBATCH_SIZE,False)
        xi =  X_b[random_indexes]
        yi = y[random_indexes]
        gradients = 2 * xi.T.dot(xi.dot(theta) - yi) # This is the partial deriv cost function ( MSE)
        eta = learning_schedule(epoch * m + i)
        theta = theta - eta * gradients
        mbgd = np.concatenate([mbgd,theta],axis = 1)

However, when I look at the values of mbgd , it appears to spiral from values such as -2.5871605790804576e+17 and 3.197175730045784e+17 .但是,当我查看mbgd的值时,它似乎从-2.5871605790804576e+173.197175730045784e+17开始螺旋上升。

I was wondering if I had implemented minibatch gradient descent correctly, as in the book their graph looks much more stable:我想知道我是否正确地实现了小批量梯度下降,因为在书中他们的图表看起来更稳定:

从书中

The code seems okay.代码似乎没问题。 The behavior seems very unstable in the first 100 iterations but it tends to stabilize around the expected values (ie, the linear function parameters).在前 100 次迭代中,行为似乎非常不稳定,但它趋于稳定在预期值附近(即线性 function 参数)。

To check that you can plot mbdg values and look at the values to which they converge.检查您是否可以 plot mbdg值并查看它们收敛的值。 However the plot in the book is a little bit misleading, since the estimated values don't seems to juggle very intensely in the beginning.然而,书中的 plot 有点误导,因为估计值在开始时似乎并没有非常强烈地杂耍。

I re run the same code using new parameters for the function learning_schedule .我使用 function learning_schedule的新参数重新运行相同的代码。 I chose t_1 = 500 so that the eta stays at low and thus making the estimated parameters less juggly.我选择t_1 = 500以便eta保持在较低水平,从而使估计的参数不那么复杂。

The gradient descent seems less unstable.梯度下降似乎不那么不稳定。 And since the problem is easy this did not make the computations slow.而且由于问题很简单,这并没有使计算变慢。

import matplotlib.pyplot as plt
plt.figure()
plt.plot(np.arange(5000), mbgd[0,:5000])
plt.plot(np.arange(5000), mbgd[1,:5000])

enter image description here在此处输入图像描述

If I try to plot like the book's figure, you can see that it converges very fast.如果我像书上的图一样尝试plot,你可以看到它收敛得非常快。 Here, I plotted only the first 100 iterations.在这里,我只绘制了前 100 次迭代。

plt.figure()
plt.plot(mbgd[0,:100], mbgd[1,:100], '-')

enter image description here在此处输入图像描述

Here is another way to look at the optimization result (the first 500 iterations):这是查看优化结果(前 500 次迭代)的另一种方式:

from matplotlib import cm
linx = np.arange(0,2,.01)
plt.plot(X_b[:,1], y.squeeze(), 'o')
viridis = cm.get_cmap('viridis', 12)
colmap = viridis(np.linspace(0, 1, 50))
for i in range(50):
    plt.plot(linx, mbgd[0,i*10]+linx*mbgd[1,i*10], color=colmap[i])

enter image description here在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM