Tensorflow - does autodiff relives us from the back-prop implementation?

Question

Question

When using Tensorflow, for instance implementing a custom neural network layer, what is the standard practice to implement the back-propagation? Do we not have to work on the auto-differentiation formulas?

Background

With numpy, when creating a layer eg matmul , the back-propagation gradient is first analytically derived and coded accordingly.

def forward(self, X):
    self._X = X
    np.matmul(self.X, self.W.T, out=self._Y)
    return self.Y

def backward(self, dY):
    """dY = dL/dY is a jacobian where L is loss and Y is matmul output"""
    self._dY = dY
    return np.matmul(self.dY, self.W, out=self._dX)

In Tensorflow, there is autodiff which seems to look after the Jacobian calculation. Does this mean we do not have to manually derive the gradient formula but let the Tensorflow tape look it after?

Computing gradients

To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the forward pass. Then, during the backward pass, TensorFlow traverses this list of operations in reverse order to compute gradients.

Answer 1

Basically, Tensorflow is a symbolic math library based on dataflow and differentiable programming . We do not have to work on the auto-differentiation formulas manually. All those math operations will be done behind and automatic. You quoted correctly from the official doc about gradients computation. However, in case you want to know how it can be manually done with numpy , I would recommend you to check this fantastic course of Neural Networks and Deep Learning , especially week 4, or an alternative source here .

FYI, in TF 2 we can do custom training from scratch by overriding the train_step of the tf.keras.Model class, and there we can use tf.GradientTape API for automatic differentiation; that is, computing the gradient of computation with respect to some inputs . That same official page includes more information on this. Also, MUST see this a well-written article on tf.GradientTape . For example, using this API we can easily compute the gradient as follows:

import tensorflow as tf 

# some input 
x = tf.Variable(3.0, trainable=True)

with tf.GradientTape() as tape:
    # some output 
    y = x**3 + x**2 + x + 5

# compute gradient of y wrt x 
print(tape.gradient(y, x).numpy()) 
# 34

Also, we can compute much higher-order derivatives, such

x = tf.Variable(3.0, trainable=True)

with tf.GradientTape() as tape1:

    with tf.GradientTape() as tape2:
        y = x**3 + x**2 + x + 5
    # first derivative 
    order_1 = tape2.gradient(y, x)

# second derivative 
order_2 = tape1.gradient(order_1, x)

print(order_2.numpy()) 
# 20.0

Now, in custom model training in tf. keras tf. keras , we first make a forward pass and compute the loss and next compute gradients of the trainable variables of the model with respect to the loss . Later, we update the weights of the model based on these gradients . Below is a code snippet of it, and here are the end-to-end details. Writing a training loop from scratch.

# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:

    # Run the forward pass of the layer.
    # The operations that the layer applies
    # to its inputs are going to be recorded
    # on the GradientTape.
    logits = model(x_batch_train, training=True)  # Logits for this minibatch

    # Compute the loss value for this minibatch.
    loss_value = loss_fn(y_batch_train, logits)

# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)

# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))

Answer 2

Correct, you just need to define the forward pass and Tensorflow generates an appropriate backward pass. From tf2 autodiff :

TensorFlow provides the tf.GradientTape API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually tf.Variables. TensorFlow "records" relevant operations executed inside the context of a tf.GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

To do this, Tensorflow is given the forward pass (or the loss) and a set of tf.Variable variables to compute the derivatives on. This process is only possible for a specific set of operations defined by Tensorflow itself. In order to create a custom NN layer you need to define its forward pas using these operations (all of them part of TF or translated to it by some converter).*

Since you seem to have a numpy background, you could define your custom forward pass using numpy, and then translate it to Tensorflow using the tf_numpy API . You could alternatively use tf.numpy_function . After this, TF wil create the backpropagation for you.

(*) Note that some operations such as control statements themselves are not differentiable, thus they are invisible to gradient-based optimizers. There are some caveats about these.

Tensorflow - does autodiff relives us from the back-prop implementation?

Question

Question

Background

2 answers

solution1
1 2021-04-06 14:14:52

solution2
1 2021-04-06 14:15:02

Tensorflow - does autodiff relives us from the back-prop implementation?

Question

Question

Background

2 answers

solution1 1 2021-04-06 14:14:52

solution2 1 2021-04-06 14:15:02

solution1
1 2021-04-06 14:14:52

solution2
1 2021-04-06 14:15:02