[英]Tensorflow incompatible matrix size when using GradientTape
I am trying to run code that previously worked on tensorflow 2.2.0 on version 2.4.0-rc0 for apple silicon (using python 3.8), but it is now generating the following error regarding the matrix dimensions:我正在尝试运行以前在 tensorflow 2.2.0 上用于苹果硅的 2.4.0-rc0 版本的代码(使用 python 3.8),但它现在生成以下关于矩阵尺寸的错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError: GetOutputShape: Matrix size-incompatible: In[0]: [256,4], In[1]: [4,400]
I am using nested gradient tapes to compute the gradient of my MLP model wrt the inputs (which form part of the loss), after which I compute the gradient of the loss wrt the trainable variables as below:我正在使用嵌套梯度磁带来计算我的 MLP model wrt 输入(构成损失的一部分)的梯度,然后我计算可训练变量的损失梯度,如下所示:
def get_grad_and_loss(self, x, y):
with tf.GradientTape(persistent=True) as gl_tape:
gl_tape.watch(x)
with tf.GradientTape(persistent=True) as l_tape:
l_tape.watch(x)
y_pred = self.call(x)
grad_mat = l_tape.gradient(y_pred, x)
loss = tf.reduce_mean(tf.math.square(y_pred - y[:, tf.newaxis])) + tf.reduce_mean(tf.maximum(0, -1 * (grad_mat[:, 0])))
g = gl_tape.gradient(loss, self.trainable_weights)
return g, loss
In words I am computing the MSE and trying to force the sign of the gradient to be positive (as a soft constraint).换句话说,我正在计算 MSE 并试图强制梯度的符号为正(作为软约束)。 I have read through the documentation on gradient tape and as I understand it, setting
persistent=True
should allow me to recompute gradients freely.我已经阅读了梯度磁带上的文档,据我了解,设置
persistent=True
应该可以让我自由地重新计算梯度。 As a side note my code works fine if I omit the nested gradient tape and simply use the MSE metric, so I don't think the issue lies anywhere else in the code.作为旁注,如果我省略嵌套渐变带并简单地使用 MSE 指标,我的代码可以正常工作,所以我认为问题不在于代码中的其他任何地方。 Any pointers would be much appreciated, thanks in advance:)
任何指针将不胜感激,在此先感谢:)
You seem to have confusion over which gradient tape watches which variables.您似乎对哪个梯度磁带监视哪些变量感到困惑。 I suggest to make sure that the tapes watch over different vars.
我建议确保磁带监视不同的变量。 Presently they both watch
x
.目前他们都在看
x
。 Most likely you need to enter gl_tape.watch(self.trainable_weights)
.您很可能需要输入
gl_tape.watch(self.trainable_weights)
。 There are examples out there with 2 gradient tapes working together.有两个渐变胶带一起工作的例子。 Check them out.
去看一下。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.