[英]tensorflow GradientDescentOptimizer: Incompatible shapes between op input and calculated input gradient
The model worked well before optimization step. 在优化步骤之前,该模型运行良好。 However, when I want to optimize my model, the error message showed up:
但是,当我要优化模型时,出现错误消息:
Incompatible shapes between op input and calculated input gradient.
运算输入和计算的输入梯度之间的形状不兼容。 Forward operation: softmax_cross_entropy_with_logits_sg_12.
转发操作:softmax_cross_entropy_with_logits_sg_12。 Input index: 0. Original input shape: (16, 1).
输入索引:0。原始输入形状:(16,1)。 Calculated input gradient shape: (16, 16)
计算的输入渐变形状:(16,16)
the following is my code. 以下是我的代码。
import tensorflow as tf;
batch_size = 16
size = 400
labels = tf.placeholder(tf.int32, batch_size)
doc_encode = tf.placeholder(tf.float32, [batch_size, size])
W1 = tf.Variable(np.random.rand(size, 100), dtype=tf.float32, name='W1')
b1 = tf.Variable(np.zeros((100)), dtype=tf.float32, name='b1')
W2 = tf.Variable(np.random.rand(100, 1),dtype=tf.float32, name='W2')
b2 = tf.Variable(np.zeros((1)), dtype=tf.float32, name='b2')
D1 = tf.nn.relu(tf.matmul(doc_encode, W1) + b1)
D2 = tf.nn.selu(tf.matmul(D1, W2) + b2)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=D2))
optim = tf.train.GradientDescentOptimizer(0.01).minimize(cost, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
_cost, _optim = sess.run([cost, optim], {labels:np.array([1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1]), doc_encode: np.random.rand(batch_size, size)})
Correct following things. 更正以下内容。
First, 第一,
Change placeholders input shape to this 将占位符输入形状更改为此
X = tf.placeholder(tf.int32, shape=[None,400]
Y = tf.placeholder(tf.float32, shape=[None,1]
Why None because this gives you freedom of feeding any size. 为什么选择“ 无”,因为这使您可以自由喂食任何尺寸的食物。 This is preferred because while training you want to use mini batch but while predicting or inference time, you will generally feed single thing.
这是首选方法,因为在训练时要使用微型批处理,而在预测或推断时间时,通常将只喂一些东西。 Marking it None, takes care of that.
将其标记为None(无),即可解决。
Second, 第二,
Correct your weight initialization, you are feeding in random values, they may be negatives too. 校正体重初始化,您输入的是随机值,它们也可能是负数。 It is always recommended to initialize with slight positive value.
始终建议使用较小的正值进行初始化。 (I see you are using relu as activation, the Gradient of which is zero for negative weight values, so those weights are never updated in Gradient descent, in other words those are useless weights)
(我看到您正在使用relu作为激活,对于负权重值,其Gradient为零,因此这些权重永远不会在Gradient下降中更新,换句话说,这些都是无用的权重)
Third, 第三,
Logits are result you obtain from W2*x + b2
. Logits是从
W2*x + b2
获得的结果。 And that tf.nn.softmax_cross.....(..)
automatically applied softmax activation. 且该
tf.nn.softmax_cross.....(..)
自动应用了softmax激活。 So no need of SeLu for last layer. 因此,最后一层不需要SeLu。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.