Converting a tensor to a a numpy array inside custom training loop in TensorFlow 1.15

Question

I am in the process of training a model in TensorFlow and I want to print the loss of the model after each batch. I am using a custom training loop that looks something like

import tensorflow as tf
from tensorflow.keras.losses import cosine_similarity
from tensorflow.keras.optimizers import Adam


model = get_model(**model_params)
g = get_generator(**generator_params)

optimizer = Adam()
epochs = 10

for epoch in range(epochs):
   for i in range(len(g)):
      with tf.GradientTape() as tape:
         x,y = g[i]
         model_prediction = model(x)
         loss = cosine_similarity(y, model_prediction)
         gradients = tape.gradient(loss, model.trainable_weights)
         optimizer.apply_gradients(zip(gradients, model.trainable_weights))
         
         print(f"Batch {i}/{len(g)}. Loss: {loss.eval(session=tf.Session()): .4f}")

Since the loss is a tensor to be able to actually see the values I need to convert it to a NumPy array (the plan isn't to print the array but once I can convert the tensor into an array, that solves my problem). The way I have been trying it, unfortunately, results in the following error

Failed precondition: Error while reading resource variable dense_5/kernel from Container: localhost. 
This could mean that the variable was uninitialized. Not found: Container localhost does not exist.`

I have also tried editing the loop by adding


for epoch in range(epochs):
   for i in range(len(g)):
      with tf.GradientTape() as tape, tf.Session() as session:
         # training code
         loss_numpy = session.run(loss)

This gives me the same error as above, and also have tried initializing the global variables at each training step


for epoch in range(epochs):
   for i in range(len(g)):
      with tf.GradientTape() as tape, tf.Session() as session:
         # training code
         init = tf.global_variables_initializer()
         session.run(init)
         print(f"Batch {i}/{len(g)}. Loss: {session.run(loss): .4f}")

Which does not throw an error but is pretty slow and outputs a lot of other Nvidia-related stuff that I would like to avoid.

Is there a way to avoid the error but not have to do the variable initialization at each step. Or perhaps there is a way to silence the Nvidia-related output.

Answer 1

Looking at the code and the error, my guess is that you're not juggling the scope correctly with respect to the TensorFlow session Keras needs and uses.

One option is it's not being correctly initialized. This is possible because you're not using the vanilla Keras training regime that takes care of that. Or it's could be getting part way through and then, because you're using the with operator, the session is being closed when the block inside the with finishes. That's what with is for in Python.

I haven't tried it myself, but my hunch is that if you instantiate a session yourself before you start messing with the training and then keep that session through the whole process, this ought to work.

Incidentally, you don't actually need to convert the loss to a NumPy object to print or otherwise inspect it. You'll probably have an easier time (in speed and stability) if you do your math directly with TensorFlow and avoid having to do the conversion.

Converting a tensor to a a numpy array inside custom training loop in TensorFlow 1.15

Question

1 answers

solution1
1 ACCPTED 2021-03-11 16:45:39

Converting a tensor to a a numpy array inside custom training loop in TensorFlow 1.15

Question

1 answers

solution1 1 ACCPTED 2021-03-11 16:45:39

solution1
1 ACCPTED 2021-03-11 16:45:39