简体   繁体   English

tf.GradientTape 给出无渐变

[英]tf.GradientTape giving None gradient

I'm trying to write a custom training loop.我正在尝试编写自定义训练循环。 After creating the model, I have added some extra trainable parameter to some layers of my model.创建模型后,我在模型的某些层中添加了一些额外的可训练参数。 I have used these extra parameters to update my original parameter on every forward pass.我已经使用这些额外的参数来更新我每次前向传递的原始参数。 But when I'm calculating the gradient, it's giving None for the extra parameter that i have added last.但是当我计算梯度时,它为我最后添加的额外参数提供了 None 。 Code is given below:代码如下:

model = Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(1,1)))
model.add(Dense(1, activation='relu'))
model.add(Dense(2, activation='softmax'))

model.layers[1].add_weight(name="x1", shape=(1,), initializer=tf.keras.initializers.Constant(value=1.0),trainable=True)

dataset = tf.data.Dataset.from_tensor_slices((feature, labels))

for i, (x_batch_train, y_batch_train) in enumerate(dataset):
    with tf.GradientTape() as tape:
        for par in model.layers[1].trainable_weights:
            if "x1" in par.name:
                bits = tf.convert_to_tensor(par)
        for par in model.layers[1].trainable_weights:
            if "kernel" in par.name:
                par = bits + 1.0    
        x = model(x_batch_train, training = True)
        loss = tf.keras.losses.SparseCategoricalCrossentropy(y_batch_train, x)
        val = tape.gradient(loss, model.trainable_weights)
        for v in val:
            print(v)

Here, I have added one extra parameter called x1 and it's updating the kernel of Dense layer.在这里,我添加了一个名为x1的额外参数,它正在更新 Dense 层的kernel But I'm getting None gradient for x1 parameter.但是我得到了x1参数的无渐变。 The output is:输出是:

tf.Tensor([[0.]], shape=(1, 1), dtype=float32)
tf.Tensor([-0.], shape=(1,), dtype=float32)
None
tf.Tensor([[0. 0.]], shape=(1, 2), dtype=float32)
tf.Tensor([-0.5  0.5], shape=(2,), dtype=float32)

Why it's happening?为什么会这样?

The problem is that the changes you are making to the layer's weights have no direct connection to the output of the model in the context of tf.GradientTape and are therefore not tracked.问题是您对层权重所做的更改与tf.GradientTape上下文中的模型输出没有直接联系,因此不会被跟踪。 You could solve this with a simple custom layer:你可以用一个简单的自定义层来解决这个问题:

import tensorflow as tf

class DenseLayer(tf.keras.layers.Layer):
    def __init__(self, units=1):
        super(DenseLayer, self).__init__()
        self.units = units
    def build(self, input_shape):
        self.w = self.add_weight("kernel",
                              shape=[int(input_shape[-1]),
                                      self.units], trainable=True)
        self.b = self.add_weight(shape=(self.units,), initializer="zeros", trainable=True)
        self.bits = self.add_weight(name="x1", shape=[int(input_shape[-1]),
                                      self.units], initializer=tf.keras.initializers.ones(), trainable=True)

    def call(self, inputs):
        return tf.nn.relu(tf.matmul(inputs, (self.w + self.bits + 1.0)) + self.b)

dense_layer = DenseLayer(1)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(1,1)))
model.add(dense_layer)
model.add(tf.keras.layers.Dense(2, activation='softmax'))
print(model.summary())
dataset = tf.data.Dataset.from_tensor_slices((tf.random.normal((50, 1, 1)), tf.random.uniform((50, ), maxval=2, dtype=tf.int32))).batch(2)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
for i, (x_batch_train, y_batch_train) in enumerate(dataset):
    with tf.GradientTape() as tape:
        y = model(x_batch_train, training = True)
        loss = loss_fn(y_batch_train, y)
        val = tape.gradient(loss, model.trainable_weights)
        for v in val:
            print(v)
    optimizer.apply_gradients(zip(val, model.trainable_variables))

Your idea is good I didn't extend from the last answer but this question is asked once about the custom layer and that you can do it for lstm by training as model.fit( ... )你的想法很好,我没有从最后一个答案延伸,但是这个问题被问过一次关于自定义层的问题,你可以通过训练为模型来为 lstm 做它 model.fit( ... )

It is not about the Gradient Tape.这与渐变胶带无关。

[ Sample - Dense ]: [样本 - 密集]:

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Class / Function
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class MyDenseLayer(tf.keras.layers.Layer):
    def __init__(self, num_outputs, num_add):
        super(MyDenseLayer, self).__init__()
        self.num_outputs = num_outputs
        self.num_add = num_add
        
    def build(self, input_shape):
        self.kernel = self.add_weight("kernel",
        shape=[int(input_shape[-1]),
        self.num_outputs])

    def call(self, inputs):
        temp = tf.add( inputs, self.num_add )
        temp = tf.matmul(temp, self.kernel)
        return temp

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=( 32, 32, 4 )),
    tf.keras.layers.Normalization(mean=3., variance=2.),
    tf.keras.layers.Normalization(mean=4., variance=6.),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Reshape((128, 225)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(96, return_sequences=True, return_state=False)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(96)),
])

layer = MyDenseLayer(10, 5)

model.add(layer)
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(192, activation='relu'))
model.add(tf.keras.layers.Dense(10, activation='softmax'))
model.summary()

[ Output ]: [ 输出 ]:

样本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 tf.GradientTape 在编写自定义训练循环时给出无梯度 - tf.GradientTape giving None gradient while writing custom training loop 使用 tf.GradientTape() wrt 输入的梯度是 None (Tensorflow 2.4) - gradient using tf.GradientTape() wrt inputs is None (Tensorflow 2.4) tensorflow 2.0:tf.GradientTape()。gradient()返回None - tensorflow 2.0: tf.GradientTape().gradient() returns None tf.GradientTape 与外积返回无 - tf.GradientTape with outer product returns None 使用 tf.gradienttape 计算多个输入的梯度,但不返回 - calculate the gradient wrt to multiple inputs using tf.gradienttape, but return none 从 tf.gradients() 到 tf.GradientTape() 的转换返回 None - Conversion from tf.gradients() to tf.GradientTape() returns None tf.GradientTape() 为我的神经网络函数返回 None 值 - tf.GradientTape() returns None value for my neural network function tf.gradients 到 tf.GradientTape - tf.gradients to tf.GradientTape 张量流概率中的重新参数化:tf.GradientTape()不计算相对于分布均值的梯度 - Reparametrization in tensorflow-probability: tf.GradientTape() doesn't calculate the gradient with respect to a distribution's mean 子类 API 模型在 tf.gradienttape() 中不起作用(没有为操作“IteratorGetNext”定义梯度(操作类型:IteratorGetNext)) - Subclass API Model Not work in tf.gradienttape() (No gradient defined for operation 'IteratorGetNext' (op type: IteratorGetNext))
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM