如何在 tensorflow 2.0 中累积梯度？

Question

I'm training a model with tensorflow 2.0 .我正在用tensorflow 2.0训练模型。 The images in my training set are of different resolutions.我的训练集中的图像具有不同的分辨率。 The Model I've built can handle variable resolutions (conv layers followed by global averaging).我构建的模型可以处理可变分辨率（转换层，然后是全局平均）。 My training set is very small and I want to use full training set in a single batch.我的训练集非常小，我想在一个批次中使用完整的训练集。

Since my images are of different resolutions, I can't use model.fit() .由于我的图像具有不同的分辨率，因此我无法使用model.fit() 。 So, I'm planning to pass each sample through the network individually, accumulate the errors/gradients and then apply one optimizer step.因此，我计划将每个样本单独通过网络，累积错误/梯度，然后应用一个优化器步骤。 I'm able to compute loss values, but I don't know how to accumulate the losses/gradients.我能够计算损失值，但我不知道如何累积损失/梯度。 How can I accumulate the losses/gradients and then apply a single optimizer step?如何累积损失/梯度，然后应用单个优化器步骤？

Code :代码：

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0
    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        gradients = tape.gradient(loss_value, self.model.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
        total_loss += loss_value

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

Answer 1

In line with the Stack Overflow Answer and the explanation provided in Tensorflow Website , mentioned below is the code for Accumulating Gradients in Tensorflow Version 2.0:根据Stack Overflow Answer和Tensorflow 网站中提供的解释，下面提到的是在 Tensorflow 2.0 版中累积梯度的代码：

def train(epochs):
  for epoch in range(epochs):
    for (batch, (images, labels)) in enumerate(dataset):
       with tf.GradientTape() as tape:
        logits = mnist_model(images, training=True)
        tvs = mnist_model.trainable_variables
        accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
        zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
        loss_value = loss_object(labels, logits)

       loss_history.append(loss_value.numpy().mean())
       grads = tape.gradient(loss_value, tvs)
       #print(grads[0].shape)
       #print(accum_vars[0].shape)
       accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]



    optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
    print ('Epoch {} finished'.format(epoch))

# Call the above function    
train(epochs = 3)

Complete code can be found in this Github Gist .完整的代码可以在这个Github Gist 中找到。

Answer 2

If I understand correctly from this statement:如果我从这个陈述中正确理解：

How can I accumulate the losses/gradients and then apply a single optimizer step?如何累积损失/梯度，然后应用单个优化器步骤？

@Nagabhushan is trying to accumulate gradients and then apply the optimization on the (mean) accumulated gradient. @Nagabhushan 正在尝试累积梯度，然后对（平均）累积梯度应用优化。 The answer provided by @TensorflowSupport does not answers it. @TensorflowSupport 提供的答案没有回答。 In order to perform the optimization only once, and accumulate the gradient from several tapes, you can do the following:为了只执行一次优化，并从多个磁带中累积梯度，您可以执行以下操作：

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0

    # get trainable variables
    train_vars = self.model.trainable_variables
    # Create empty gradient list (not a tf.Variable list)
    accum_gradient = [tf.zeros_like(this_var) for this_var in train_vars]

    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        total_loss += loss_value

        # get gradients of this tape
        gradients = tape.gradient(loss_value, train_vars)
        # Accumulate the gradients
        accum_gradient = [(acum_grad+grad) for acum_grad, grad in zip(accum_gradient, gradients)]


    # Now, after executing all the tapes you needed, we apply the optimization step
    # (but first we take the average of the gradients)
    accum_gradient = [this_grad/num_samples for this_grad in accum_gradient]
    # apply optimization step
    self.optimizer.apply_gradients(zip(accum_gradient,train_vars))
        

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

Using tf.Variable() should be avoided inside the training loop, since it will produce errors when trying to execute the code as a graph.应避免在训练循环中使用 tf.Variable()，因为在尝试将代码作为图形执行时会产生错误。 If you use tf.Variable() inside your training function and then decorate it with "@tf.function" or apply "tf.function(my_train_fcn)" to obtain a graph function (ie for improved performance), the execution will rise an error.如果你在你的训练函数中使用 tf.Variable() 然后用“@tf.function”装饰它或应用“tf.function(my_train_fcn)”来获得一个图函数（即为了提高性能），执行将上升错误。 This happens because the tracing of the tf.Variable function results in a different behaviour than the observed in eager execution (re-utilization or creation, respectively).发生这种情况是因为对 tf.Variable 函数的跟踪导致了与在急切执行中观察到的行为（分别为重新利用或创建）不同的行为。 You can find more info on this in the tensorflow help page .您可以在tensorflow 帮助页面中找到更多信息。

如何在 tensorflow 2.0 中累积梯度？

问题描述

2 个解决方案

解决方案1
3 2020-02-11 13:47:28

解决方案2
3 已采纳 2020-07-01 19:06:47

如何在 tensorflow 2.0 中累积梯度？

问题描述

2 个解决方案

解决方案1 3 2020-02-11 13:47:28

解决方案2 3 已采纳 2020-07-01 19:06:47

解决方案1
3 2020-02-11 13:47:28

解决方案2
3 已采纳 2020-07-01 19:06:47