Tensorflow 2.0: How can I fully customize a Tensorflow training loop like I can with PyTorch?

Question

I used to use Tensorflow a lot before, but moved over to Pytorch because it was just a lot easier to debug. The nice thing I found with PyTorch is that I have to write my own training loop, so I can step through the code and find errors. I can fire up pdb and check the tensor shapes and transformations, etc., without difficulty.

In Tensorflow I was using the model.fit() function all the time, and so any error message I got was like 6 pages of C code where the error message did not give me any indication was in the python code. User's can't step through the model.fit() function since it is a static graph, so that really slowed down my development process. BUT, I was thinking about using Tensorflow again and I was wondering whether a user can step through a custom training loop and look at tensor shapes, etc., or whether even a custom training loop is compiled to a static graph and hence users cannot step through it?

I did google this question, but all of the tutorials for custom training loops in Tensorflow focus on custom loops being for advanced users, such as if you want to apply some exotic callback while training or if you want to apply some conditional logic. So the simple question of whether it is easy to step through a custom training loop is not answered.

Any help is appreciated. Thanks.

Answer 1

This is almost as custom and bare bones I can make it. I also used subclassed layers.

import tensorflow as tf
import tensorflow_datasets as tfds

ds = tfds.load('iris', split='train', as_supervised=True)

train = ds.take(125).shuffle(125).batch(1)
test = ds.skip(125).take(25).shuffle(25).batch(1)

class Dense(tf.Module):
  def __init__(self, in_features, out_features, activation, name=None):
    super().__init__(name=name)
    self.activation = activation
    self.w = tf.Variable(
      tf.initializers.GlorotUniform()([in_features, out_features]), name='weights')
    self.b = tf.Variable(tf.zeros([out_features]), name='biases')
  def __call__(self, x):
    y = tf.matmul(x, self.w) + self.b
    return self.activation(y)

class SequentialModel(tf.Module):
  def __init__(self, name):
    super().__init__(name=name)
    self.dense1 = Dense(in_features=4, out_features=16, activation=tf.nn.relu)
    self.dense2 = Dense(in_features=16, out_features=32, activation=tf.nn.relu)
    self.dense3 = Dense(in_features=32, out_features=3, activation=tf.nn.softmax)

  def __call__(self, x):
    x = self.dense1(x)
    x = self.dense2(x)
    x = self.dense3(x)
    return x

model = SequentialModel(name='sequential_model')

loss_object = tf.losses.SparseCategoricalCrossentropy(from_logits=False)

def compute_loss(model, x, y):
  out = model(x)
  loss = loss_object(y_true=y, y_pred=out)
  return loss, out


def get_grad(model, x, y):
    with tf.GradientTape() as tape:
        loss, out = compute_loss(model, x, y)
        gradients = tape.gradient(loss, model.trainable_variables)
    return loss, gradients, out


optimizer = tf.optimizers.Adam()

verbose = "Epoch {:2d} Loss: {:.3f} TLoss: {:.3f} Acc: {:=7.2%} TAcc: {:=7.2%}"

for epoch in range(1, 10 + 1):
    train_loss = tf.constant(0.)
    train_acc = tf.constant(0.)
    test_loss = tf.constant(0.)
    test_acc = tf.constant(0.)

    for n_train, (x, y) in enumerate(train, 1):
        loss_value, grads, out = get_grad(model, x, y)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))
        train_loss += loss_value
        train_acc += tf.metrics.sparse_categorical_accuracy(y, out)[0]

    for n_test, (x, y) in enumerate(test, 1):
        loss_value, _, out = get_grad(model, x, y)
        test_loss += loss_value
        test_acc += tf.metrics.sparse_categorical_accuracy(y, out)[0]

    print(verbose.format(epoch,
                         tf.divide(train_loss, n_train),
                         tf.divide(test_loss, n_test),
                         tf.divide(train_acc, n_train),
                         tf.divide(test_acc, n_test)))

Answer 2

Althogh I think tensorflows errors are very helpful, but your answer is yes , you can think from basic to high level with tensorflow. And you can run your network in dynamic or static.

When I wanted to learn tensorflow and keras, I wrote some code on colab and you can find one of them that's related to your question here . But I will write the part that you want here (it's not the full code):

batch_size = 32
epochs = 10
loss_func = keras.losses.CategoricalCrossentropy()
opt = keras.optimizers.Adam(learning_rate=0.003)

(valX, valy) = (trainX[-10000:], trainy[-10000:])

# for easily shuffling later
train_dataset = tf.data.Dataset.from_tensor_slices((trainX, trainy))

# layers
conv1 = Conv2D_custom(32, (3,3), activation='relu', input_shape = trainX.shape[1:])
conv2 = Conv2D_custom(64, (3,3), activation='relu')
dense = Dense_custom(10, activation='softmax')
max1 = MaxPooling2D((2, 2))
max2 = MaxPooling2D((2, 2))
flat = Flatten()
dropout = Dropout(0.5)

for i in range(epochs):
  print("Epoch: ", i)
  epoch_loss = 0

  train_dataset = train_dataset.shuffle(buffer_size=1024)
  train_batched = train_dataset.batch(batch_size)

  for step, (batchX, batchy) in enumerate(train_batched):
    with tf.GradientTape() as tape:
      x = conv1(batchX)
      x = max1(x)
      x = conv2(x)
      x = max2(x)
      x = flat(x)
      x = dropout(x, training=True)
      x = dense(x)
      loss = loss_func(batchy, x)

    trainable_vars = conv1.trainable_weights + conv2.trainable_weights + dense.trainable_weights
    grads = tape.gradient(loss, trainable_vars)
    opt.apply_gradients(zip(grads, trainable_vars))

    epoch_loss += loss

    if step % 200 == 0:
      print("\tStep ", step, ":\t loss = ", epoch_loss.numpy()/(step+1))
  
  # epoch ended, validate it
  x = conv1(valX)
  x = max1(x)
  x = conv2(x)
  x = max2(x)
  x = flat(x)
  x = dropout(x, training=False)
  x = dense(x)
  val_loss = loss_func(valy, x)
  
  print("Epoch ", i, " ended.\t", "loss = ", epoch_loss.numpy()/len(train_batched), " ,\tval_loss = ", val_loss.numpy())

Its not pretty and you can do much better than this (I didn't know a lot about tensorflow and keras). But it works well for MNIST dataset and is customizable as you want.

Tensorflow has something called Eager execution and you can run your network dynamically with it, it just needs to be enabled. most of the time tensorflow struggles with eager execution itself and even you don't need to tell tensorflow when to run in eager mode and when to run in graph mode (it will automatically use graph mode when possible, to get more performance).

And again you can be sure that you can deal with tensorflow in each level you want, almost from scratch to high level as in keras.

Hope it helps, and have fun programming :)

Answer 3

Tensorflow uses eager execution, it means your graph is connected dynamically. Offcourse it's not as dynamic as PyTorch. But, google is trying its best and incorporated a lot of features in 2.0 and beyond.

For writing a custom loop in Tensorflow you need to use tf.GradientTape . I would say it involves the same steps as in pytorch. Compute gradients and then update using optimizer.

Tensorflow 2.0: How can I fully customize a Tensorflow training loop like I can with PyTorch?

Question

3 answers

solution1
3 ACCPTED 2020-10-20 16:56:46

solution2
1 2020-10-20 16:11:40

solution3
0 2020-10-20 18:29:14

Tensorflow 2.0: How can I fully customize a Tensorflow training loop like I can with PyTorch?

Question

3 answers

solution1 3 ACCPTED 2020-10-20 16:56:46

solution2 1 2020-10-20 16:11:40

solution3 0 2020-10-20 18:29:14

solution1
3 ACCPTED 2020-10-20 16:56:46

solution2
1 2020-10-20 16:11:40

solution3
0 2020-10-20 18:29:14