Tensorflow 使用 GradientTape 时矩阵大小不兼容

Question

I am trying to run code that previously worked on tensorflow 2.2.0 on version 2.4.0-rc0 for apple silicon (using python 3.8), but it is now generating the following error regarding the matrix dimensions:我正在尝试运行以前在 tensorflow 2.2.0 上用于苹果硅的 2.4.0-rc0 版本的代码（使用 python 3.8），但它现在生成以下关于矩阵尺寸的错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError: GetOutputShape: Matrix size-incompatible: In[0]: [256,4], In[1]: [4,400]

I am using nested gradient tapes to compute the gradient of my MLP model wrt the inputs (which form part of the loss), after which I compute the gradient of the loss wrt the trainable variables as below:我正在使用嵌套梯度磁带来计算我的 MLP model wrt 输入（构成损失的一部分）的梯度，然后我计算可训练变量的损失梯度，如下所示：

    def get_grad_and_loss(self, x, y):
        with tf.GradientTape(persistent=True) as gl_tape:
            gl_tape.watch(x)

            with tf.GradientTape(persistent=True) as l_tape:
                l_tape.watch(x)
                y_pred = self.call(x)

            grad_mat = l_tape.gradient(y_pred, x)
            loss = tf.reduce_mean(tf.math.square(y_pred - y[:, tf.newaxis])) + tf.reduce_mean(tf.maximum(0, -1 * (grad_mat[:, 0])))

        g = gl_tape.gradient(loss, self.trainable_weights)

        return g, loss

In words I am computing the MSE and trying to force the sign of the gradient to be positive (as a soft constraint).换句话说，我正在计算 MSE 并试图强制梯度的符号为正（作为软约束）。 I have read through the documentation on gradient tape and as I understand it, setting persistent=True should allow me to recompute gradients freely.我已经阅读了梯度磁带上的文档，据我了解，设置persistent=True应该可以让我自由地重新计算梯度。 As a side note my code works fine if I omit the nested gradient tape and simply use the MSE metric, so I don't think the issue lies anywhere else in the code.作为旁注，如果我省略嵌套渐变带并简单地使用 MSE 指标，我的代码可以正常工作，所以我认为问题不在于代码中的其他任何地方。 Any pointers would be much appreciated, thanks in advance:)任何指针将不胜感激，在此先感谢:)

Answer 1

You seem to have confusion over which gradient tape watches which variables.您似乎对哪个梯度磁带监视哪些变量感到困惑。 I suggest to make sure that the tapes watch over different vars.我建议确保磁带监视不同的变量。 Presently they both watch x .目前他们都在看x 。 Most likely you need to enter gl_tape.watch(self.trainable_weights) .您很可能需要输入gl_tape.watch(self.trainable_weights) 。 There are examples out there with 2 gradient tapes working together.有两个渐变胶带一起工作的例子。 Check them out.去看一下。

Tensorflow 使用 GradientTape 时矩阵大小不兼容

问题描述

1 个解决方案

解决方案1
0 2021-02-12 13:55:23

Tensorflow 使用 GradientTape 时矩阵大小不兼容

问题描述

1 个解决方案

解决方案1 0 2021-02-12 13:55:23

解决方案1
0 2021-02-12 13:55:23