I am interested in computing the gradient of a loss that is calculated from a product of a matrix multiplication in TensorFlow with Eager Execution. I can do so if the product is computed as a tensor, but not if it's assign()
ed in place to a variable. Here is the greatly reduced code:
import tensorflow as tf
import numpy as np
tf.enable_eager_execution()
multipliers_net = tf.get_variable("multipliers", shape=(1, 3, 3, 1),
initializer=tf.random_normal_initializer())
activations_net = tf.Variable(tf.ones_like(multipliers_net))
output_indices = [(0, 1, 2, 0)]
def step():
global activations_net
#### PROBLEMATIC ####
activations_net.assign(multipliers_net * activations_net)
#### NO PROBLEM ####
# activations_net = multipliers_net * activations_net
return tf.gather_nd(activations_net, output_indices)
def train(targets):
for y in targets:
with tf.GradientTape() as tape:
out = step()
print("OUT", out)
loss = tf.reduce_mean(tf.square(y - out))
print("LOSS", loss)
de_dm = tape.gradient(loss, multipliers_net)
print("GRADIENT", de_dm, sep="\n")
multipliers_net.assign(LEARNING_RATE * de_dm)
targets = [[2], [3], [4], [5]]
train(targets)
As it stands, this code will show the correct OUT and LOSS values, but the GRADIENT will be printed as None . However, if the line below "PROBLEMATIC" is commented and the "NO PROBLEM" is uncommented, the gradient is computed just fine. I infer this is because in the second case, activations_net
becomes a Tensor, but I don't know why that suddenly makes the gradient computable, whereas before it was not.
I'm pretty sure that I should keep activations_net
and multiplier_net
as Variables, because in the larger scheme of things, both are updated dynamically and as I understand it, such things are best kept as Variables instead of constantly reassigning Tensors.
I'll try to explain to the best of my knowledge. The problem occurs in a this line
de_dm = tape.gradient(loss, multipliers_net)
If you print(tape.watched_variables()
in both "PROBLEMATIC" and "NO PROBLEM" cases, you'll see that in first case tape 'watches' the same multipliers_net
variable twice. You can try tape.reset()
and tape.watch()
, but it will have no effect as long as you pass assign op into it. If you try multipliers_net.assign(any_variable)
inside tf.GradientTape()
, you'll find that it won't work. But if you try assigning something that produces tensor, eg tf.ones_like()
, it will work.
multipliers_net.assign(LEARNING_RATE * de_dm)
This works for same reason. It seems to accept only eager_tensors
Hope this helps
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.