理解线性回归 model 调整 tf.keras 的问题

Question

我正在使用合成数据 Colab 练习进行线性回归，该练习使用玩具数据集探索线性回归。 有一个线性回归 model 构建和训练，可以玩转学习率、时期和批量大小。 我很难理解迭代是如何完成的，以及它如何与“epoch”和“batch size”相关联。 我基本上不知道如何训练实际的 model，如何处理数据和完成迭代。 为了理解这一点，我想通过手动计算每个步骤来遵循这一点。 因此，我希望每个步骤都有斜率和截距系数。 这样我就可以看到“计算机”使用什么样的数据，放入 model，什么样的 model 每次特定迭代的结果以及迭代是如何完成的。 我首先尝试获取每一步的斜率和截距，但是失败了，因为只有在最后才输出斜率和截距。 我修改后的代码（原版，刚刚添加：）

  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)

代码：

import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt

#@title Define the functions that build and train a model
def build_model(my_learning_rate):
  """Create and compile a simple linear regression model."""
  # Most simple tf.keras models are sequential. 
  # A sequential model contains one or more layers.
  model = tf.keras.models.Sequential()

  # Describe the topography of the model.
  # The topography of a simple linear regression model
  # is a single node in a single layer. 
  model.add(tf.keras.layers.Dense(units=1, 
                                  input_shape=(1,)))

  # Compile the model topography into code that 
  # TensorFlow can efficiently execute. Configure 
  # training to minimize the model's mean squared error. 
  model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=my_learning_rate),
                loss="mean_squared_error",
                metrics=[tf.keras.metrics.RootMeanSquaredError()])
 
  return model           


def train_model(model, feature, label, epochs, batch_size):
  """Train the model by feeding it data."""

  # Feed the feature values and the label values to the 
  # model. The model will train for the specified number 
  # of epochs, gradually learning how the feature values
  # relate to the label values. 
  history = model.fit(x=feature,
                      y=label,
                      batch_size=batch_size,
                      epochs=epochs)

  # Gather the trained model's weight and bias.
  trained_weight = model.get_weights()[0]
  trained_bias = model.get_weights()[1]
  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)
  # The list of epochs is stored separately from the 
  # rest of history.
  epochs = history.epoch

  # Gather the history (a snapshot) of each epoch.
  hist = pd.DataFrame(history.history)

 # print(hist)
  # Specifically gather the model's root mean 
  #squared error at each epoch. 
  rmse = hist["root_mean_squared_error"]

  return trained_weight, trained_bias, epochs, rmse

print("Defined create_model and train_model")

#@title Define the plotting functions
def plot_the_model(trained_weight, trained_bias, feature, label):
  """Plot the trained model against the training feature and label."""

  # Label the axes.
  plt.xlabel("feature")
  plt.ylabel("label")

  # Plot the feature values vs. label values.
  plt.scatter(feature, label)

  # Create a red line representing the model. The red line starts
  # at coordinates (x0, y0) and ends at coordinates (x1, y1).
  x0 = 0
  y0 = trained_bias
  x1 = my_feature[-1]
  y1 = trained_bias + (trained_weight * x1)
  plt.plot([x0, x1], [y0, y1], c='r')

  # Render the scatter plot and the red line.
  plt.show()

def plot_the_loss_curve(epochs, rmse):
  """Plot the loss curve, which shows loss vs. epoch."""

  plt.figure()
  plt.xlabel("Epoch")
  plt.ylabel("Root Mean Squared Error")

  plt.plot(epochs, rmse, label="Loss")
  plt.legend()
  plt.ylim([rmse.min()*0.97, rmse.max()])
  plt.show()

print("Defined the plot_the_model and plot_the_loss_curve functions.")

my_feature = ([1.0, 2.0,  3.0,  4.0,  5.0,  6.0,  7.0,  8.0,  9.0, 10.0, 11.0, 12.0])
my_label   = ([5.0, 8.8,  9.6, 14.2, 18.8, 19.5, 21.4, 26.8, 28.9, 32.0, 33.8, 38.2])

learning_rate=0.05
epochs=1
my_batch_size=12

my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature, 
                                                         my_label, epochs,
                                                         my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)

在我的具体情况下，我的 output 是：

现在我尝试在一个简单的 excel 表中复制它并手动计算 rmse：

但是，我得到 21.8 而不是 23.1？ 另外我的损失不是535.48，而是476.82

因此，我的第一个问题是：我的错误在哪里，rmse 是如何计算的？

第二个问题：如何获得每次特定迭代的 rmse？ 假设 epoch 为 4，batch size 为 4。

这给出了 4 个时期和 3 个批次，每 4 个示例（观察）。 我不明白 model 是如何通过这些迭代训练的。 那么如何获得每个回归 model 和 rmse 的系数？ 不仅针对每个 epoch（所以 4），而且针对每个迭代。 我认为每个时代都有 3 次迭代。 所以我认为总共有 12 个线性回归模型？ 我想看看这12个模型。 在没有给出信息的情况下，起点使用的初始值是什么，使用什么样的斜率和截距？ 从真正的第一点开始。 我没有具体说明这一点。 然后我希望能够了解每一步如何调整斜率和截距。 这将来自我认为的梯度下降算法。 但这将是超级优势。 对我来说更重要的是首先了解这些迭代是如何完成的，以及它们如何连接到 epoch 和 batch。

更新：我知道初始值（斜率和截距）是随机选择的。

Answer 1

基础

问题陈述

让我们考虑一组样本X的线性回归 model ，其中每个样本由一个特征x表示。 作为 model 训练的一部分，我们正在搜索wx + b线，使得((w.x+b) -y )^2 （平方损失）最小。 对于一组数据点，我们采用每个样本的均方损失，即所谓的均方误差 (MSE)。 代表权重和偏差的w和b统称为权重。

装配线/训练 model

我们有一个解决线性回归问题的封闭形式解决方案，是(X^TX)^-1.X^Ty
我们还可以使用梯度下降法来搜索最小化平方损失的权重。 tensorflow、pytorch 等框架使用梯度下降来搜索权重（称为训练）。

渐变体面

用于学习回归的梯度体面算法看起来像打击

w, b = some initial value
While model has not converged:
    y_hat = w.X + b
    error = MSE(y, y_hat) 
    back propagate (BPP) error and adjust weights

上述循环的每次运行称为一个时期。 然而，由于资源限制， y_hat 、 error和 BPP 的计算并未在完整数据集上执行，而是将数据分成更小的批次，并且一次对一批执行上述操作。 此外，我们通常会固定 epoch 的数量并监控 model 是否已经收敛。

w, b = some initial value
for i in range(number_of_epochs)
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat) 
    back propagate (BPP) error and adjust weights

Keras 批量执行

假设我们想添加均方根误差，以便在训练时跟踪 model 的性能。 Keras的实现方式如下

w, b = some initial value
for i in range(number_of_epochs)
    all_y_hats = []
    all_ys = []
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat)

        all_y_hats.extend(y_hat) 
        all_ys.extend(y_batch)

        batch_rms_error = RMSE(all_ys, all_y_hats)

    back propagate (BPP) error and adjust weights

正如您在上面看到的，预测是累积的，RMSE 是根据累积的预测计算的，而不是取所有先前批次 RMSE 的平均值。

keras 中的实现

现在我们的基础很清楚了，让我们看看如何在 keras 中实现相同的跟踪。 keras 有回调，所以我们可以挂钩on_batch_begin回调并累积all_y_hats和all_ys 。 在on_batch_end回调中 keras 给了我们计算的RMSE 。 我们将使用我们累积的all_y_hats和all_ys手动计算RMSE ，并验证它是否与 keras 计算的相同。 我们还将保存权重，以便稍后我们可以 plot 正在学习的行。

import numpy as np
from sklearn.metrics import mean_squared_error
import keras
import matplotlib.pyplot as plt

# Some training data
X = np.arange(16)
y = 0.5*X +0.2

batch_size = 8
all_y_hats = []
learned_weights = [] 

class CustomCallback(keras.callbacks.Callback):
  def on_batch_begin(self, batch, logs={}):    
    w = self.model.layers[0].weights[0].numpy()[0][0]
    b = self.model.layers[0].weights[1].numpy()[0]    
    s = batch*batch_size
    all_y_hats.extend(b + w*X[s:s+batch_size])    
    learned_weights.append([w,b])

  def on_batch_end(self, batch, logs={}):    
    calculated_error = np.sqrt(mean_squared_error(all_y_hats, y[:len(all_y_hats)]))
    print (f"\n Calculated: {calculated_error},  Actual: {logs['root_mean_squared_error']}")
    assert np.isclose(calculated_error, logs['root_mean_squared_error'])

  def on_epoch_end(self, batch, logs={}):
    del all_y_hats[:]    


model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape=(1,)))
model.compile(optimizer=keras.optimizers.RMSprop(lr=0.01), loss="mean_squared_error",  metrics=[keras.metrics.RootMeanSquaredError()])
# We should set shuffle=False so that we know how baches are divided
history = model.fit(X,y, epochs=100, callbacks=[CustomCallback()], batch_size=batch_size, shuffle=False)

Output：

Epoch 1/100
 8/16 [==============>...............] - ETA: 0s - loss: 16.5132 - root_mean_squared_error: 4.0636
 Calculated: 4.063645694548688,  Actual: 4.063645839691162

 Calculated: 8.10112834945773,  Actual: 8.101128578186035
16/16 [==============================] - 0s 3ms/step - loss: 65.6283 - root_mean_squared_error: 8.1011
Epoch 2/100
 8/16 [==============>...............] - ETA: 0s - loss: 14.0454 - root_mean_squared_error: 3.7477
 Calculated: 3.7477213352845675,  Actual: 3.7477214336395264
-------------- truncated -----------------------

达达！ assert assert np.isclose(calculated_error, logs['root_mean_squared_error'])从未失败，因此我们的计算/理解是正确的。

线

最后，让 plot 基于均方误差损失的 BPP 算法正在调整的线。 我们可以使用下面的代码创建一个 png 图像，其中包含每批学习的线条以及训练数据。

for i, (w,b) in enumerate(learned_weights):
  plt.close()
  plt.axis([-1, 18, -1, 10])
  plt.scatter(X, y)
  plt.plot([-1,17], [-1*w+b, 17*w+b], color='green')
  plt.savefig(f'img{i+1}.png')

以下是上述图片的gif animation，按照学习顺序排列。

当y = 0.5*X +5.2时学习超平面（本例中为直线）

Answer 2

我试着玩了一下，我认为它是这样工作的：

初始化每个特征的权重（通常是随机的，取决于设置）。 初始值为 0.0 的偏差也被启动。
计算和打印第一批的损失和指标，并更新权重和偏差。
步骤 2. 对 epoch 中的所有批次重复，但是，在最后一批 loss 和 metrics 没有打印之后，所以你在屏幕上看到的是在 epoch 中最后一次更新之前的 loss 和 metrics 。
新纪元开始了，您看到的第一个度量和损失实际上是根据上一个纪元的最后更新权重计算的那些......

所以基本上我认为可以直观地说，首先计算损失，然后更新权重，这意味着，权重更新是 epoch 中的最后一次操作。

如果您的 model 使用一个时期和一批进行训练，那么您在屏幕上看到的是根据初始权重和偏差计算的损失。 如果您想在每个时期结束后查看损失和指标（具有大多数“实际”权重），您可以传递给参数validation_data=(X,y)以fit方法。 这告诉算法在 epoch 结束时再次根据给定的验证数据计算损失和指标。

关于 model 的初始权重，您可以在手动为层设置一些初始权重时尝试（使用kernel_initializer参数）：

  model.add(tf.keras.layers.Dense(units=1,
                                  input_shape=(1,),
                                  kernel_initializer=tf.constant_initializer(.5)))

这是train_model function 的更新部分，它显示了我的意思：

  def train_model(model, feature, label, epochs, batch_size):
        """Train the model by feeding it data."""

        # Feed the feature values and the label values to the
        # model. The model will train for the specified number
        # of epochs, gradually learning how the feature values
        # relate to the label values.
        init_slope = model.get_weights()[0][0][0]
        init_bias = model.get_weights()[1][0]
        print('init slope is {}'.format(init_slope))
        print('init bias is {}'.format(init_bias))

        history = model.fit(x=feature,
                          y=label,
                          batch_size=batch_size,
                          epochs=epochs,
                          validation_data=(feature,label))

        # Gather the trained model's weight and bias.
        #print(model.get_weights())
        trained_weight = model.get_weights()[0]
        trained_bias = model.get_weights()[1]
        print("Slope")
        print(trained_weight)
        print("Intercept")
        print(trained_bias)
        # The list of epochs is stored separately from the
        # rest of history.
        prediction_manual = [trained_weight[0][0]*i + trained_bias[0] for i in feature]

        manual_loss = np.mean(((np.array(label)-np.array(prediction_manual))**2))
        print('manually computed loss after slope and bias update is {}'.format(manual_loss))
        print('manually computed rmse after slope and bias update is {}'.format(manual_loss**(1/2)))

        prediction_manual_init = [init_slope*i + init_bias for i in feature]
        manual_loss_init = np.mean(((np.array(label)-np.array(prediction_manual_init))**2))
        print('manually computed loss with init slope and bias is {}'.format(manual_loss_init))
        print('manually copmuted loss with init slope and bias is {}'.format(manual_loss_init**(1/2)))

output：

"""
init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 402.9850 - root_mean_squared_error: 20.0745 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
Slope
[[0.65811384]]
Intercept
[0.15811387]
manually computed loss after slope and bias update is 352.3350379264957
manually computed rmse after slope and bias update is 18.77058970641295
manually computed loss with init slope and bias is 402.98499999999996
manually copmuted loss with init slope and bias is 20.074486294797182
"""

请注意，坡度和偏差更新后手动计算的损失和指标与验证损失和指标匹配，更新前手动计算的损失和指标与初始坡度和偏差的损失和指标相匹配。

关于第二个问题，我认为您可以手动将数据分成批次，然后遍历每个批次并适应它。 然后，在每次迭代中，model 打印验证数据的损失和指标。 像这样的东西：

  init_slope = model.get_weights()[0][0][0]
  init_bias = model.get_weights()[1][0]
  print('init slope is {}'.format(init_slope))
  print('init bias is {}'.format(init_bias))
  batch_size = 3

  for idx in range(0,len(feature),batch_size):
      model.fit(x=feature[idx:idx+batch_size],
                y=label[idx:idx+batch_size],
                batch_size=1000,
                epochs=epochs,
                validation_data=(feature,label))
      print('slope: {}'.format(model.get_weights()[0][0][0]))
      print('intercept: {}'.format(model.get_weights()[1][0]))
      print('x data used: {}'.format(feature[idx:idx+batch_size]))
      print('y data used: {}'.format(label[idx:idx+batch_size]))

output：

init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 48.9000 - root_mean_squared_error: 6.9929 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
slope: 0.6581138372421265
intercept: 0.15811386704444885
x data used: [1.0, 2.0, 3.0]
y data used: [5.0, 8.8, 9.6]
1/1 [==============================] - 0s 21ms/step - loss: 200.9296 - root_mean_squared_error: 14.1750 - val_loss: 306.3082 - val_root_mean_squared_error: 17.5017
slope: 0.8132714033126831
intercept: 0.3018075227737427
x data used: [4.0, 5.0, 6.0]
y data used: [14.2, 18.8, 19.5]
1/1 [==============================] - 0s 22ms/step - loss: 363.2630 - root_mean_squared_error: 19.0595 - val_loss: 266.7119 - val_root_mean_squared_error: 16.3313
slope: 0.9573485255241394
intercept: 0.42669767141342163
x data used: [7.0, 8.0, 9.0]
y data used: [21.4, 26.8, 28.9]
1/1 [==============================] - 0s 22ms/step - loss: 565.5593 - root_mean_squared_error: 23.7815 - val_loss: 232.1553 - val_root_mean_squared_error: 15.2366
slope: 1.0924618244171143
intercept: 0.5409283638000488
x data used: [10.0, 11.0, 12.0]
y data used: [32.0, 33.8, 38.2]

Answer 3

线性回归 Model

线性回归 Model 只有一个神经元具有线性激活 function。 关于训练 model 的基础是我们使用梯度下降。 每次整个数据通过 model 并更新权重时，称为1 epoch 。 然而，迭代和历元的概念在这里没有什么不同。

基本训练步骤：

Prepare data
Initialize the model and its parameters (weights and biases)
for each epoch:  #(both iteration and epoch same here)
    Forward Propagation
    Compute Cost
    Back Propagation
    Update Parameters

梯度下降具有三个变体：

批量梯度下降 (BDG)
随机梯度下降 (SDG)
小批量梯度下降（MDG）

批量梯度下降是我们之前谈到的（传递整个数据）。 通常也称为梯度下降。

在随机梯度下降中，我们一次通过 1 个随机示例，并且权重随着每个示例的通过而更新。 现在迭代开始发挥作用。 使用 1 个示例完成 model 训练后，完成了1 次迭代。 然而，数据集中还有更多 model 尚未见过的示例。 完全训练所有这些示例称为1 epoch 。 由于一次通过 1 个示例，对于较大的数据集，SDG 非常慢，因为它失去了矢量化的效果。

所以我们一般使用Mini-Batch Gradient Descent 。 在这里，数据集被分成许多固定大小的块。 每个数据块的大小称为批量大小，它可以介于 1 和数据大小之间。 在每个 Epoch 上，这些批次的数据用于训练 model。

1 次迭代处理 1 批数据。 1 epoch 处理整批数据。 1 个 epoch 包含 1 次或多次迭代。

因此，如果数据的大小为 m，则每次迭代期间输入的数据为：

BDG = 米
可持续发展目标 = 1
千年发展目标 = 1 < x < 米

MGD 的基本训练步骤：

Prepare data
Initialize the model and its parameters (weights and biases)
for each epoch:  #(epoch)
    for each mini_batch: #(iteration)
        Forward Propagation
        Compute Cost
        Back Propagation
        Update Parameters

这是梯度下降、批处理、历元和迭代背后的理论概念。

现在转到 Keras 和您的代码：

我给你运行了 Colab Code，它工作得很好。 在您发布的代码中，纪元数为 1，这对于 model 学习来说非常小，因为数据很少，而且 model 本身非常简单。 因此，您需要增加数据量或创建更复杂的 model或训练到目前为止我从笔记本中找到的 400-500 的更多时期。 在适当调整学习率时，可以减少 epoch 数

learning_rate=0.14
epochs=70
my_batch_size= 32 

my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature, 
                                                        my_label, epochs,
                                                        my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)

如果学习率非常小，model 将学习缓慢，因此需要更大的训练周期（epoch）才能进行更准确的预测。 增加学习率可以加快学习过程，因此可以减少时期。 请比较 colab 中代码的不同部分以获取适当的示例。

关于获取每次迭代的指标：

Keras 是 TensorFlow 的高级 API。 到目前为止，我知道（不考虑 API 的定制），在 Keras 的训练期间，它会在每次迭代结束时计算训练集的损失、错误和准确度，并在每个时期结束时返回它们各自的平均值。 因此，如果有n 个epoch，那么无论中间有多少次迭代，这些指标中的每一个都会有n个。

关于斜率和截距：

Linear regression model use the linear activation function at the output layer which is y = mx + c . 对于我们拥有的价值观

y - 指 output
x - 指输入
m - 指斜率（必须调整）
c - 指截距（也可以调整）

在我们的 model 这些m和c是我们调整的。 它们是我们 model 的重量和偏差。 所以我们的 function 看起来像y = Wx + b其中b 给出截距， w 给出斜率。 权重和偏差在开始时随机初始化。

从零开始的线性回归 Model 的 Colab 链接

请根据需要调整值。 由于 model 是从头开始实现的，因此在训练期间收集或打印您想要跟踪的任何值。 您也可以使用自己的数据集，但请确保它有效或由某些库生成以进行 model 验证（sklearn）。

https://colab.research.google.com/drive/1RfuRNMoVv-l6KyM_SegdJOHiXD_0xBHq?usp=sharing

PS如果您发现任何令人困惑的事情，请发表评论。 我很乐意回复。

理解线性回归 model 调整 tf.keras 的问题

问题描述

3 个解决方案

解决方案1
2 2020-06-28 18:34:27

基础

问题陈述

装配线/训练 model

渐变体面

Keras 批量执行

keras 中的实现

线

解决方案2
1 2020-06-25 20:52:55

解决方案3
-1 2020-06-26 02:08:30

线性回归 Model

现在转到 Keras 和您的代码：

关于获取每次迭代的指标：

关于斜率和截距：

从零开始的线性回归 Model 的 Colab 链接

理解线性回归 model 调整 tf.keras 的问题

问题描述

3 个解决方案

解决方案1 2 2020-06-28 18:34:27

基础

问题陈述

装配线/训练 model

渐变体面

Keras 批量执行

keras 中的实现

线

解决方案2 1 2020-06-25 20:52:55

解决方案3 -1 2020-06-26 02:08:30

线性回归 Model

现在转到 Keras 和您的代码：

关于获取每次迭代的指标：

关于斜率和截距：

从零开始的线性回归 Model 的 Colab 链接

解决方案1
2 2020-06-28 18:34:27

解决方案2
1 2020-06-25 20:52:55

解决方案3
-1 2020-06-26 02:08:30