如何在 Keras、Tensorflow 中覆蓋優化算法的梯度向量計算方法？

Question

所以我試圖修改 Keras 中的幾個優化算法，即 Adam 或只是 SGD。 因此，默認情況下，我很確定參數更新的工作方式是在批處理中的數據點上平均損失，然后根據此損失值計算梯度向量。 另一種思考方式是，您可以根據批次中每個數據點的損失值對梯度進行平均。 這是我想要改變的計算方式，它會很昂貴，所以我試圖在使用 GPU 和所有這些的優化框架內進行。

因此，對於每個批次，我需要計算與批次中每個數據點的損失相關的梯度，然后我將做一些其他的平均值或計算，而不是取梯度的平均值。 有誰知道我將如何訪問以覆蓋 Adam 或 SGD 的此功能？

在發表了很棒的評論后，我發現應該有一種方法可以使用GradientTape的jacobian方法來做我想要做的jacobian 。 但是文檔不是那么徹底，我無法弄清楚它如何適應整體情況。 在這里，我希望有人可以幫助我調整代碼以使用jacobian而不是gradient 。

作為一個 hello world 示例，我試圖用一些使用jacobian並產生相同輸出的代碼簡單地替換gradient線。 這將說明如何使用jacobian方法以及與gradient方法的輸出的連接。

工作代碼

class CustomModel(keras.Model):
    def train_step(self, data):
        # Unpack the data. Its structure depends on your model and
        # on what you pass to `fit()`.
        x, y = data

        with tf.GradientTape() as tape:
            y_pred = self(x, training=True)  # Forward pass
            # Compute the loss value
            # (the loss function is configured in `compile()`)
            loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)

        # Compute gradients
        trainable_vars = self.trainable_variables
        gradients = tape.gradient(loss, trainable_vars) # <-- line to change
        # Update weights
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))
        # Update metrics (includes the metric that tracks the loss)
        self.compiled_metrics.update_state(y, y_pred)
        # Return a dict mapping metric names to current value
        return {m.name: m.result() for m in self.metrics}

Answer 1

您應該能夠執行以下操作：

class CustomModel(keras.Model):
    def train_step(self, data):
        # Unpack the data. Its structure depends on your model and
        # on what you pass to `fit()`.
        x, y = data

        with tf.GradientTape() as tape:
            y_pred = self(x, training=True)  # Forward pass
            # Compute the loss value
            # (the loss function is configured in `compile()`)
            loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)

        # Compute gradients
        trainable_vars = self.trainable_variables
        gradients = tape.jacobian(loss, trainable_vars)

        new_gradients = []
        for grad in gradients:
            new_grad = do_something_to(grad)
            new_gradients.append(new_grad)

        # Update weights
        self.optimizer.apply_gradients(zip(new_gradients, trainable_vars))
        # Update metrics (includes the metric that tracks the loss)
        self.compiled_metrics.update_state(y, y_pred)
        # Return a dict mapping metric names to current value
        return {m.name: m.result() for m in self.metrics}

一些重要的注意事項：由compiled_loss函數返回的loss不能在批處理軸上求平均值，即我假設它是一個形狀為(batch_size, )的張量，而不是一個標量。
這將導致 jacobian 返回形狀(batch_size, ) + variable_shape梯度，也就是說，您現在擁有每個批次元素的梯度。 您現在可以隨意操縱這些漸變，並且應該在某些時候擺脫額外的批處理軸（例如平均）。 也就是說， new_grad應該與相應的變量具有相同的形狀。

關於您的最后一條評論：正如我所提到的，損失函數確實需要為每個數據點返回一個損失，即不得對批次進行平均。 然而，這還不夠，因為如果您將此向量提供給tape.gradient ，梯度函數將簡單地總結損失值（因為它僅適用於標量）。 這就是為什么jacobian是必要的。

最后， jacobian可能會很慢。 在最壞的情況下，運行時間可能會乘以批量大小，因為它需要計算那么多單獨的梯度。 但是，這在某種程度上是並行完成的，因此放緩可能不會那么嚴重。

如何在 Keras、Tensorflow 中覆蓋優化算法的梯度向量計算方法？

問題描述

1 個解決方案

解決方案1
2 已采納 2020-11-08 09:44:54

如何在 Keras、Tensorflow 中覆蓋優化算法的梯度向量計算方法？

問題描述

1 個解決方案

解決方案1 2 已采納 2020-11-08 09:44:54

解決方案1
2 已采納 2020-11-08 09:44:54