简体   繁体   English

理解线性回归 model 调整 tf.keras 的问题

[英]Problems understanding linear regression model tuning in tf.keras

I am working on the Linear Regression with Synthetic Data Colab exercise , which explores linear regression with a toy dataset.我正在使用合成数据 Colab 练习进行线性回归,该练习使用玩具数据集探索线性回归。 There is a linear regression model built and trained and one can play around with the learning rate, the epoch and the batch size.有一个线性回归 model 构建和训练,可以玩转学习率、时期和批量大小。 I have troubles understanding how exactly the iterations are done and how this connects to the "epoch" and the "batch size".我很难理解迭代是如何完成的,以及它如何与“epoch”和“batch size”相关联。 I am basically not getting how the actual model is trained, how data is processed and iterations are done.我基本上不知道如何训练实际的 model,如何处理数据和完成迭代。 To understand this I wanted to follow this by calculating each step manually.为了理解这一点,我想通过手动计算每个步骤来遵循这一点。 Therefore I wanted to have the slope and intercept coefficient for each step.因此,我希望每个步骤都有斜率和截距系数。 So that I can see what kind of data the "computer" uses, puts into the model, what kind of model results at each specific iteration and how iterations are done.这样我就可以看到“计算机”使用什么样的数据,放入 model,什么样的 model 每次特定迭代的结果以及迭代是如何完成的。 I tried first to get the slope and intercept for each single step, however failed, because only at the end the slope and intercept is outputted.我首先尝试获取每一步的斜率和截距,但是失败了,因为只有在最后才输出斜率和截距。 My modified code (original, just added:)我修改后的代码(原版,刚刚添加:)

  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)

code:代码:

import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt

#@title Define the functions that build and train a model
def build_model(my_learning_rate):
  """Create and compile a simple linear regression model."""
  # Most simple tf.keras models are sequential. 
  # A sequential model contains one or more layers.
  model = tf.keras.models.Sequential()

  # Describe the topography of the model.
  # The topography of a simple linear regression model
  # is a single node in a single layer. 
  model.add(tf.keras.layers.Dense(units=1, 
                                  input_shape=(1,)))

  # Compile the model topography into code that 
  # TensorFlow can efficiently execute. Configure 
  # training to minimize the model's mean squared error. 
  model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=my_learning_rate),
                loss="mean_squared_error",
                metrics=[tf.keras.metrics.RootMeanSquaredError()])
 
  return model           


def train_model(model, feature, label, epochs, batch_size):
  """Train the model by feeding it data."""

  # Feed the feature values and the label values to the 
  # model. The model will train for the specified number 
  # of epochs, gradually learning how the feature values
  # relate to the label values. 
  history = model.fit(x=feature,
                      y=label,
                      batch_size=batch_size,
                      epochs=epochs)

  # Gather the trained model's weight and bias.
  trained_weight = model.get_weights()[0]
  trained_bias = model.get_weights()[1]
  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)
  # The list of epochs is stored separately from the 
  # rest of history.
  epochs = history.epoch

  # Gather the history (a snapshot) of each epoch.
  hist = pd.DataFrame(history.history)

 # print(hist)
  # Specifically gather the model's root mean 
  #squared error at each epoch. 
  rmse = hist["root_mean_squared_error"]

  return trained_weight, trained_bias, epochs, rmse

print("Defined create_model and train_model")

#@title Define the plotting functions
def plot_the_model(trained_weight, trained_bias, feature, label):
  """Plot the trained model against the training feature and label."""

  # Label the axes.
  plt.xlabel("feature")
  plt.ylabel("label")

  # Plot the feature values vs. label values.
  plt.scatter(feature, label)

  # Create a red line representing the model. The red line starts
  # at coordinates (x0, y0) and ends at coordinates (x1, y1).
  x0 = 0
  y0 = trained_bias
  x1 = my_feature[-1]
  y1 = trained_bias + (trained_weight * x1)
  plt.plot([x0, x1], [y0, y1], c='r')

  # Render the scatter plot and the red line.
  plt.show()

def plot_the_loss_curve(epochs, rmse):
  """Plot the loss curve, which shows loss vs. epoch."""

  plt.figure()
  plt.xlabel("Epoch")
  plt.ylabel("Root Mean Squared Error")

  plt.plot(epochs, rmse, label="Loss")
  plt.legend()
  plt.ylim([rmse.min()*0.97, rmse.max()])
  plt.show()

print("Defined the plot_the_model and plot_the_loss_curve functions.")

my_feature = ([1.0, 2.0,  3.0,  4.0,  5.0,  6.0,  7.0,  8.0,  9.0, 10.0, 11.0, 12.0])
my_label   = ([5.0, 8.8,  9.6, 14.2, 18.8, 19.5, 21.4, 26.8, 28.9, 32.0, 33.8, 38.2])

learning_rate=0.05
epochs=1
my_batch_size=12

my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature, 
                                                         my_label, epochs,
                                                         my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)

In my specific case my output was:在我的具体情况下,我的 output 是:

前1

Now I tried to replicate this in a simple excel sheet and calculated the rmse manually:现在我尝试在一个简单的 excel 表中复制它并手动计算 rmse:

埃索

However, I get 21.8 and not 23.1?但是,我得到 21.8 而不是 23.1? Also my loss is not 535.48, but 476.82另外我的损失不是535.48,而是476.82

My first question is therefore: Where is my mistake, how is the rmse calculated?因此,我的第一个问题是:我的错误在哪里,rmse 是如何计算的?

Second question(s): How can I get the rmse for each specific iteration?第二个问题:如何获得每次特定迭代的 rmse? Let's consider epoch is 4 and batch size is 4.假设 epoch 为 4,batch size 为 4。

考试

That gives 4 epochs and 3 batches with each 4 examples (observations).这给出了 4 个时期和 3 个批次,每 4 个示例(观察)。 I don't understand how the model is trained with these iterations.我不明白 model 是如何通过这些迭代训练的。 So how can I get the coefficients of each regression model and rmse?那么如何获得每个回归 model 和 rmse 的系数? Not just for each epoch (so 4), but for each iteration.不仅针对每个 epoch(所以 4),而且针对每个迭代。 I think each epoch has 3 iterations.我认为每个时代都有 3 次迭代。 So in total I think 12 linear regression models result?所以我认为总共有 12 个线性回归模型? I would like to see these 12 models.我想看看这12个模型。 What are the initial values used in the starting point when no information is given, what kind of slope and intercept is used?在没有给出信息的情况下,起点使用的初始值是什么,使用什么样的斜率和截距? The starting at the really first point.从真正的第一点开始。 I don't specify this.我没有具体说明这一点。 Then I would like to be able follow how the slope and intercepts are adapted at each step.然后我希望能够了解每一步如何调整斜率和截距。 This will be from the gradient descent algorithm I think.这将来自我认为的梯度下降算法。 But that would be the super plus.但这将是超级优势。 More important for me is first to understand how these iterations are done and how they connect to the epoch and batch.对我来说更重要的是首先了解这些迭代是如何完成的,以及它们如何连接到 epoch 和 batch。

Update: I know that the initial values (for the slope and intercept) are choosen randomly.更新:我知道初始值(斜率和截距)是随机选择的。

Foundation基础

Problem statement问题陈述

Lets consider a linear regression model for a set of samples X where each sample is represented by one feature x .让我们考虑一组样本X的线性回归 model ,其中每个样本由一个特征x表示。 As part of model training, we are searching for the line wx + b such that ((w.x+b) -y )^2 (squared loss) is minimal.作为 model 训练的一部分,我们正在搜索wx + b线,使得((w.x+b) -y )^2 (平方损失)最小。 For a set of data points we take mean of squared loss for each sample and so called mean squared error (MSE).对于一组数据点,我们采用每个样本的均方损失,即所谓的均方误差 (MSE)。 The w and b which stands for weight and bias are together referred to as weights.代表权重和偏差的wb统称为权重。

Fitting the line/Training the model装配线/训练 model

  1. We have a closed form solution for solving the linear regression problem and is (X^TX)^-1.X^Ty我们有一个解决线性回归问题的封闭形式解决方案,是(X^TX)^-1.X^Ty
  2. We can also use gradient decent method to search for weights which minimize the squared loss.我们还可以使用梯度下降法来搜索最小化平方损失的权重。 The frameworks like tensorflow, pytorch use gradient decent to search the weights (called training). tensorflow、pytorch 等框架使用梯度下降来搜索权重(称为训练)。

Gradient decent渐变体面

A gradient decent algorithm for learning regression looks like blow用于学习回归的梯度体面算法看起来像打击

w, b = some initial value
While model has not converged:
    y_hat = w.X + b
    error = MSE(y, y_hat) 
    back propagate (BPP) error and adjust weights

Each run of the above loop is called an epoch.上述循环的每次运行称为一个时期。 However due to resource constrains the calculation of y_hat , error and BPP is not preformed on full dataset, instead the data is divided into smaller batches and the above operations are performed on one batch at a time.然而,由于资源限制, y_haterror和 BPP 的计算并未在完整数据集上执行,而是将数据分成更小的批次,并且一次对一批执行上述操作。 Also we normally fix the number of epoch and monitor if the model has converged.此外,我们通常会固定 epoch 的数量并监控 model 是否已经收敛。

w, b = some initial value
for i in range(number_of_epochs)
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat) 
    back propagate (BPP) error and adjust weights

Keras implementation of batches Keras 批量执行

Lets say we would like to add root mean squared error for tracing the model performance while it is training.假设我们想添加均方根误差,以便在训练时跟踪 model 的性能。 The way Keras implements is as below Keras的实现方式如下

w, b = some initial value
for i in range(number_of_epochs)
    all_y_hats = []
    all_ys = []
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat)

        all_y_hats.extend(y_hat) 
        all_ys.extend(y_batch)

        batch_rms_error = RMSE(all_ys, all_y_hats)

    back propagate (BPP) error and adjust weights

As you can see above, the predictions are accumulated and RMSE is calculated on the accumulated predictions rather then taking the mean of the all previous batch RMSE.正如您在上面看到的,预测是累积的,RMSE 是根据累积的预测计算的,而不是取所有先前批次 RMSE 的平均值。

Implementation in keras keras 中的实现

Now that our foundation is clear, lets see how we can implement tracking the same in keras.现在我们的基础很清楚了,让我们看看如何在 keras 中实现相同的跟踪。 keras has callbacks, so we can hook into on_batch_begin callback and accumulate the all_y_hats and all_ys . keras 有回调,所以我们可以挂钩on_batch_begin回调并累积all_y_hatsall_ys On the on_batch_end callback keras gives us the calculated RMSE .on_batch_end回调中 keras 给了我们计算的RMSE We will manually calculate RMSE using our accumulated all_y_hats and all_ys and verify if it is same as what keras calculated.我们将使用我们累积的all_y_hatsall_ys手动计算RMSE ,并验证它是否与 keras 计算的相同。 We will also save the weights so that we can later plot the line which is being learned.我们还将保存权重,以便稍后我们可以 plot 正在学习的行。

import numpy as np
from sklearn.metrics import mean_squared_error
import keras
import matplotlib.pyplot as plt

# Some training data
X = np.arange(16)
y = 0.5*X +0.2

batch_size = 8
all_y_hats = []
learned_weights = [] 

class CustomCallback(keras.callbacks.Callback):
  def on_batch_begin(self, batch, logs={}):    
    w = self.model.layers[0].weights[0].numpy()[0][0]
    b = self.model.layers[0].weights[1].numpy()[0]    
    s = batch*batch_size
    all_y_hats.extend(b + w*X[s:s+batch_size])    
    learned_weights.append([w,b])

  def on_batch_end(self, batch, logs={}):    
    calculated_error = np.sqrt(mean_squared_error(all_y_hats, y[:len(all_y_hats)]))
    print (f"\n Calculated: {calculated_error},  Actual: {logs['root_mean_squared_error']}")
    assert np.isclose(calculated_error, logs['root_mean_squared_error'])

  def on_epoch_end(self, batch, logs={}):
    del all_y_hats[:]    


model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape=(1,)))
model.compile(optimizer=keras.optimizers.RMSprop(lr=0.01), loss="mean_squared_error",  metrics=[keras.metrics.RootMeanSquaredError()])
# We should set shuffle=False so that we know how baches are divided
history = model.fit(X,y, epochs=100, callbacks=[CustomCallback()], batch_size=batch_size, shuffle=False) 

Output: Output:

Epoch 1/100
 8/16 [==============>...............] - ETA: 0s - loss: 16.5132 - root_mean_squared_error: 4.0636
 Calculated: 4.063645694548688,  Actual: 4.063645839691162

 Calculated: 8.10112834945773,  Actual: 8.101128578186035
16/16 [==============================] - 0s 3ms/step - loss: 65.6283 - root_mean_squared_error: 8.1011
Epoch 2/100
 8/16 [==============>...............] - ETA: 0s - loss: 14.0454 - root_mean_squared_error: 3.7477
 Calculated: 3.7477213352845675,  Actual: 3.7477214336395264
-------------- truncated -----------------------

Ta-da!达达! the assert assert np.isclose(calculated_error, logs['root_mean_squared_error']) never failed so our calculation/understanding is correct. assert assert np.isclose(calculated_error, logs['root_mean_squared_error'])从未失败,因此我们的计算/理解是正确的。

The line线

Finally, lets plot the line which is being adjusted by the BPP algorithm based on the mean squared error loss.最后,让 plot 基于均方误差损失的 BPP 算法正在调整的线。 We can use the below code to create a png image of the line being learned at each batch along with the train data.我们可以使用下面的代码创建一个 png 图像,其中包含每批学习的线条以及训练数据。

for i, (w,b) in enumerate(learned_weights):
  plt.close()
  plt.axis([-1, 18, -1, 10])
  plt.scatter(X, y)
  plt.plot([-1,17], [-1*w+b, 17*w+b], color='green')
  plt.savefig(f'img{i+1}.png')

Below is the gif animation of the above images in the order they are learned.以下是上述图片的gif animation,按照学习顺序排列。

在此处输入图像描述

The hyperplane (line in this case) being learned when y = 0.5*X +5.2y = 0.5*X +5.2时学习超平面(本例中为直线)

在此处输入图像描述

I tried to play with it a little, and I think it is working like this:我试着玩了一下,我认为它是这样工作的:

  1. weights (usually random, depending on settings) for each feature are initialized.初始化每个特征的权重(通常是随机的,取决于设置)。 Also bias, which is initially 0.0 is initiated.初始值为 0.0 的偏差也被启动。
  2. loss and metrics for first batch are computed and printed and weights and bias are updated.计算和打印第一批的损失和指标,并更新权重和偏差。
  3. step 2. is repeated for all batches in epoch, however, after last batch loss and metrics are not printed, so what you see on screen are loss and metrics before last update in the epoch .步骤 2. 对 epoch 中的所有批次重复,但是,在最后一批 loss 和 metrics 没有打印之后,所以你在屏幕上看到的是在 epoch 中最后一次更新之前的 loss 和 metrics
  4. new epoch is started and first metrics and loss you see printed, are actually those one computed on last updated weights from previous epoch...新纪元开始了,您看到的第一个度量和损失实际上是根据上一个纪元的最后更新权重计算的那些......

So basically I think that intuitively it can be told that first loss is computed, then weights are updated, which means, that weights update is last operation in epoch.所以基本上我认为可以直观地说,首先计算损失,然后更新权重,这意味着,权重更新是 epoch 中的最后一次操作。

If your model is trained using one epoch and one batch, then what you see on screen is loss computed on initial weights and bias.如果您的 model 使用一个时期和一批进行训练,那么您在屏幕上看到的是根据初始权重和偏差计算的损失。 If you want to see loss and metrics after end of each epoch (with most "actual" weights), you can pass to parameter validation_data=(X,y) to fit method.如果您想在每个时期结束后查看损失和指标(具有大多数“实际”权重),您可以传递给参数validation_data=(X,y)fit方法。 That tells the algorithm to compute loss and metrics once again on this given validation data, when epoch is finished.这告诉算法在 epoch 结束时再次根据给定的验证数据计算损失和指标。

Regarding initial weights of model, you can try it when you manually set some initial weights to the layer (using kernel_initializer parameter):关于 model 的初始权重,您可以在手动为层设置一些初始权重时尝试(使用kernel_initializer参数):

  model.add(tf.keras.layers.Dense(units=1,
                                  input_shape=(1,),
                                  kernel_initializer=tf.constant_initializer(.5)))

Here is the updated part of train_model function, which shows what I meant:这是train_model function 的更新部分,它显示了我的意思:

  def train_model(model, feature, label, epochs, batch_size):
        """Train the model by feeding it data."""

        # Feed the feature values and the label values to the
        # model. The model will train for the specified number
        # of epochs, gradually learning how the feature values
        # relate to the label values.
        init_slope = model.get_weights()[0][0][0]
        init_bias = model.get_weights()[1][0]
        print('init slope is {}'.format(init_slope))
        print('init bias is {}'.format(init_bias))

        history = model.fit(x=feature,
                          y=label,
                          batch_size=batch_size,
                          epochs=epochs,
                          validation_data=(feature,label))

        # Gather the trained model's weight and bias.
        #print(model.get_weights())
        trained_weight = model.get_weights()[0]
        trained_bias = model.get_weights()[1]
        print("Slope")
        print(trained_weight)
        print("Intercept")
        print(trained_bias)
        # The list of epochs is stored separately from the
        # rest of history.
        prediction_manual = [trained_weight[0][0]*i + trained_bias[0] for i in feature]

        manual_loss = np.mean(((np.array(label)-np.array(prediction_manual))**2))
        print('manually computed loss after slope and bias update is {}'.format(manual_loss))
        print('manually computed rmse after slope and bias update is {}'.format(manual_loss**(1/2)))

        prediction_manual_init = [init_slope*i + init_bias for i in feature]
        manual_loss_init = np.mean(((np.array(label)-np.array(prediction_manual_init))**2))
        print('manually computed loss with init slope and bias is {}'.format(manual_loss_init))
        print('manually copmuted loss with init slope and bias is {}'.format(manual_loss_init**(1/2)))

output: output:

"""
init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 402.9850 - root_mean_squared_error: 20.0745 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
Slope
[[0.65811384]]
Intercept
[0.15811387]
manually computed loss after slope and bias update is 352.3350379264957
manually computed rmse after slope and bias update is 18.77058970641295
manually computed loss with init slope and bias is 402.98499999999996
manually copmuted loss with init slope and bias is 20.074486294797182
"""

Note that manually computed loss and metrics after slope and bias update matches to validation loss and metrics and manually computed loss and metrics before update matches the loss and metrics of initial slope and bias.请注意,坡度和偏差更新后手动计算的损失和指标与验证损失和指标匹配,更新前手动计算的损失和指标与初始坡度和偏差的损失和指标相匹配。


Regarding second question, I think that you could split your data into batches manually and then iterate over each batch and fit on it.关于第二个问题,我认为您可以手动将数据分成批次,然后遍历每个批次并适应它。 Then, in each iteration, model prints loss and metrics for validation data.然后,在每次迭代中,model 打印验证数据的损失和指标。 Something like this:像这样的东西:

  init_slope = model.get_weights()[0][0][0]
  init_bias = model.get_weights()[1][0]
  print('init slope is {}'.format(init_slope))
  print('init bias is {}'.format(init_bias))
  batch_size = 3

  for idx in range(0,len(feature),batch_size):
      model.fit(x=feature[idx:idx+batch_size],
                y=label[idx:idx+batch_size],
                batch_size=1000,
                epochs=epochs,
                validation_data=(feature,label))
      print('slope: {}'.format(model.get_weights()[0][0][0]))
      print('intercept: {}'.format(model.get_weights()[1][0]))
      print('x data used: {}'.format(feature[idx:idx+batch_size]))
      print('y data used: {}'.format(label[idx:idx+batch_size]))

output: output:

init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 48.9000 - root_mean_squared_error: 6.9929 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
slope: 0.6581138372421265
intercept: 0.15811386704444885
x data used: [1.0, 2.0, 3.0]
y data used: [5.0, 8.8, 9.6]
1/1 [==============================] - 0s 21ms/step - loss: 200.9296 - root_mean_squared_error: 14.1750 - val_loss: 306.3082 - val_root_mean_squared_error: 17.5017
slope: 0.8132714033126831
intercept: 0.3018075227737427
x data used: [4.0, 5.0, 6.0]
y data used: [14.2, 18.8, 19.5]
1/1 [==============================] - 0s 22ms/step - loss: 363.2630 - root_mean_squared_error: 19.0595 - val_loss: 266.7119 - val_root_mean_squared_error: 16.3313
slope: 0.9573485255241394
intercept: 0.42669767141342163
x data used: [7.0, 8.0, 9.0]
y data used: [21.4, 26.8, 28.9]
1/1 [==============================] - 0s 22ms/step - loss: 565.5593 - root_mean_squared_error: 23.7815 - val_loss: 232.1553 - val_root_mean_squared_error: 15.2366
slope: 1.0924618244171143
intercept: 0.5409283638000488
x data used: [10.0, 11.0, 12.0]
y data used: [32.0, 33.8, 38.2]

Linear Regression Model线性回归 Model

Linear Regression Model has only one neuron with linear activation function.线性回归 Model 只有一个神经元具有线性激活 function。 The basics about the training the model is that we use Gradient Descent.关于训练 model 的基础是我们使用梯度下降。 Each time the entire data is passed through the model and the weights are updated it is called 1 epoch .每次整个数据通过 model 并更新权重时,称为1 epoch However the concept of iteration and epoch is no different here.然而,迭代和历元的概念在这里没有什么不同。

Basic Training Steps :基本训练步骤

Prepare data
Initialize the model and its parameters (weights and biases)
for each epoch:  #(both iteration and epoch same here)
    Forward Propagation
    Compute Cost
    Back Propagation
    Update Parameters

Gradient Descent has three Variants :梯度下降具有三个变体

  • Batch Gradient Descent (BDG)批量梯度下降 (BDG)
  • Stochastic Gradient Descent (SDG)随机梯度下降 (SDG)
  • Mini-Batch Gradient Descent (MDG)小批量梯度下降(MDG)

Batch gradient Descent is what we talked about earlier (passing entire data).批量梯度下降是我们之前谈到的(传递整个数据)。 In general also known as Gradient Descent.通常也称为梯度下降。

In Stochastic Gradient Descent we pass 1 random example at a time and the weight is updated with every example passed.随机梯度下降中,我们一次通过 1 个随机示例,并且权重随着每个示例的通过而更新。 Now the iteration comes into play.现在迭代开始发挥作用。 On completion of training the model with 1 example, 1 iteration is completed.使用 1 个示例完成 model 训练后,完成了1 次迭代 However there are more examples in the data set that the model has not seen yet.然而,数据集中还有更多 model 尚未见过的示例。 Completely training all those examples is called 1 epoch .完全训练所有这些示例称为1 epoch Since 1 example is passed at a time SDG is very slow for larger data set as it losses the effect of vectorization.由于一次通过 1 个示例,对于较大的数据集,SDG 非常慢,因为它失去了矢量化的效果。

So we generally use Mini-Batch Gradient Descent .所以我们一般使用Mini-Batch Gradient Descent Here the data set is divided into a number of chunks of fixed size.在这里,数据集被分成许多固定大小的块。 The size of each chunk of data is called the batch size and it can be anywhere between 1 and the data size.每个数据块的大小称为批量大小,它可以介于 1 和数据大小之间。 On each Epoch these batches of data are used to train the model.在每个 Epoch 上,这些批次的数据用于训练 model。

1 iteration processes 1 batch of data. 1 次迭代处理 1 批数据。 1 epoch processes entire batches of data. 1 epoch 处理整批数据。 1 epoch contains 1 or more iterations. 1 个 epoch 包含 1 次或多次迭代。

Thus, if the size of data is m, the data fed during each iteration are:因此,如果数据的大小为 m,则每次迭代期间输入的数据为:

  • BDG = m BDG = 米
  • SDG = 1可持续发展目标 = 1
  • MDG = 1 < x < m千年发展目标 = 1 < x < 米

Basic Training Steps for MGD : MGD 的基本训练步骤

Prepare data
Initialize the model and its parameters (weights and biases)
for each epoch:  #(epoch)
    for each mini_batch: #(iteration)
        Forward Propagation
        Compute Cost
        Back Propagation
        Update Parameters

This is the theoretical concept behind Gradient Descents, batch, epoch and Iteration.这是梯度下降、批处理、历元和迭代背后的理论概念。

Now moving on to Keras and your code:现在转到 Keras 和您的代码:

I ran you Colab Code and its working perfectly fine.我给你运行了 Colab Code,它工作得很好。 In the code you have posted the number of epoch is 1 that is extremely small for the model to learn since there is very little data and the model itself is very simple.在您发布的代码中,纪元数为 1,这对于 model 学习来说非常小,因为数据很少,而且 model 本身非常简单。 So you need to either increase the data volume or create more complex model or train for larger number of epoch from 400-500 so far I found from the notebook.因此,您需要增加数据量创建更复杂的 model训练到目前为止我从笔记本中找到的 400-500 的更多时期。 On properly adjusting the learning rate, the epoch number can be decreased as such在适当调整学习率时,可以减少 epoch 数

learning_rate=0.14
epochs=70
my_batch_size= 32 

my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature, 
                                                        my_label, epochs,
                                                        my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)

If the learning rate is very small the model will learn slowly so it requires larger training cycles(epoch) to do more accurate prediction.如果学习率非常小,model 将学习缓慢,因此需要更大的训练周期(epoch)才能进行更准确的预测。 Increasing the learning rate the learning process speeds up so the epochs can be decreased.增加学习率可以加快学习过程,因此可以减少时期。 Please compare different sections of the code in the colab for proper examples.请比较 colab 中代码的不同部分以获取适当的示例。

Regarding getting metrics for each iteration:关于获取每次迭代的指标:

Keras is a High-level API of TensorFlow. Keras 是 TensorFlow 的高级 API。 So far I know(not considering the customization of the API), During the training in Keras it calculates the loss, errors and accuracy for the training set at the end of each iteration and at the end of each epoch it returns their respective average.到目前为止,我知道(不考虑 API 的定制),在 Keras 的训练期间,它会在每次迭代结束时计算训练集的损失、错误和准确度,并在每个时期结束时返回它们各自的平均值。 So if there are n epochs then there would be n number of each of these metrics no matter how many iterations comes in between.因此,如果有n 个epoch,那么无论中间有多少次迭代,这些指标中的每一个都会有n个。

Regarding the slope and the intercept:关于斜率和截距:

Linear regression model use the linear activation function at the output layer which is y = mx + c . Linear regression model use the linear activation function at the output layer which is y = mx + c . For the values we have对于我们拥有的价值观

  • y - refers to the output y - 指 output
  • x - refers to the inputs x - 指输入
  • m - refers to the slope (that has to be adjusted) m - 指斜率(必须调整)
  • c - refers to the intercept (that can also be adjusted) c - 指截距(也可以调整)

In our model these m and c are what we adjust.在我们的 model 这些mc是我们调整的。 They are the weight and bias of our model.它们是我们 model 的重量偏差 So our function looks like y = Wx + b where b gives the intercept and w gives the slope .所以我们的 function 看起来像y = Wx + b其中b 给出截距w 给出斜率 Weights and biases are initialized randomly at the beginning.权重和偏差在开始时随机初始化。

Colab link for Linear Regression Model from Scratch从零开始的线性回归 Model 的 Colab 链接

Please tweak the values as needed.请根据需要调整值。 Since the model is implemented from scratch, collect or print any value you want to track during the training.由于 model 是从头开始实现的,因此在训练期间收集或打印您想要跟踪的任何值。 You may also use your own data set, but make sure it is valid or generated by some library for model validation(sklearn).您也可以使用自己的数据集,但请确保它有效或由某些库生成以进行 model 验证(sklearn)。

https://colab.research.google.com/drive/1RfuRNMoVv-l6KyM_SegdJOHiXD_0xBHq?usp=sharing https://colab.research.google.com/drive/1RfuRNMoVv-l6KyM_SegdJOHiXD_0xBHq?usp=sharing

PS If you find any thing confusing, please Comment. PS如果您发现任何令人困惑的事情,请发表评论。 I would be happy to reply.我很乐意回复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM