简体   繁体   中英

Tensorflow cannot get gradient wrt a Variable, but can wrt a Tensor

I am interested in computing the gradient of a loss that is calculated from a product of a matrix multiplication in TensorFlow with Eager Execution. I can do so if the product is computed as a tensor, but not if it's assign() ed in place to a variable. Here is the greatly reduced code:

import tensorflow as tf
import numpy as np
tf.enable_eager_execution()

multipliers_net = tf.get_variable("multipliers", shape=(1, 3, 3, 1),
                                  initializer=tf.random_normal_initializer())
activations_net = tf.Variable(tf.ones_like(multipliers_net))
output_indices = [(0, 1, 2, 0)]

def step():
    global activations_net

    #### PROBLEMATIC ####
    activations_net.assign(multipliers_net * activations_net)
    #### NO PROBLEM ####
    # activations_net = multipliers_net * activations_net

    return tf.gather_nd(activations_net, output_indices)


def train(targets):
    for y in targets:
        with tf.GradientTape() as tape:
            out = step()
            print("OUT", out)
            loss = tf.reduce_mean(tf.square(y - out))
            print("LOSS", loss)
        de_dm = tape.gradient(loss, multipliers_net)
        print("GRADIENT", de_dm, sep="\n")
        multipliers_net.assign(LEARNING_RATE * de_dm)


targets = [[2], [3], [4], [5]]

train(targets)

As it stands, this code will show the correct OUT and LOSS values, but the GRADIENT will be printed as None . However, if the line below "PROBLEMATIC" is commented and the "NO PROBLEM" is uncommented, the gradient is computed just fine. I infer this is because in the second case, activations_net becomes a Tensor, but I don't know why that suddenly makes the gradient computable, whereas before it was not.

I'm pretty sure that I should keep activations_net and multiplier_net as Variables, because in the larger scheme of things, both are updated dynamically and as I understand it, such things are best kept as Variables instead of constantly reassigning Tensors.

I'll try to explain to the best of my knowledge. The problem occurs in a this line

de_dm = tape.gradient(loss, multipliers_net)

If you print(tape.watched_variables() in both "PROBLEMATIC" and "NO PROBLEM" cases, you'll see that in first case tape 'watches' the same multipliers_net variable twice. You can try tape.reset() and tape.watch() , but it will have no effect as long as you pass assign op into it. If you try multipliers_net.assign(any_variable) inside tf.GradientTape() , you'll find that it won't work. But if you try assigning something that produces tensor, eg tf.ones_like() , it will work.

multipliers_net.assign(LEARNING_RATE * de_dm)

This works for same reason. It seems to accept only eager_tensors Hope this helps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM