[英]Computing the gradients of new state (of the RNN) with respect to model parameters, (including CNN for inputs), in tensorflow; tf.gradient return None
Summary: 摘要:
I have a 1d CNN that extracts features from the inputs. 我有一个1D CNN,可从输入中提取特征。 The CNN is then follows by an RNN. 然后,CNN之后是RNN。 I am looking for a way to back propagate gradients from the new_state
of the RNN to the CNN parameters. 我正在寻找一种方法来将梯度从RNN的new_state
传播回CNN参数。 Also, we can consider a conv layer with kernel size [1, 1, input_num_features, output_num_features]
. 另外,我们可以考虑一个具有内核大小[1, 1, input_num_features, output_num_features]
的转换层。 The code is below: 代码如下:
import tensorflow as tf 将tensorflow作为tf导入
mseed = 123
tf.set_random_seed(mseed)
kernel_initializer = tf.glorot_normal_initializer(seed=mseed)
# Graph Hyperparameters
cell_size = 64
num_classes = 2
m_dtype = tf.float32
num_features = 30
inputs_train_ph = tf.placeholder(dtype=m_dtype, shape=[None, 75, num_features], name="inputs_train_ph")
inputs_devel_ph = tf.placeholder(dtype=m_dtype, shape=[None, 75, num_features], name="inputs_devel_ph")
labels_train_ph = tf.placeholder(dtype=m_dtype, shape=[None, 75, num_classes], name="labels_train_ph")
labels_devel_ph = tf.placeholder(dtype=m_dtype, shape=[None, 75, num_classes], name="labels_devel_ph")
def return_inputs_train(): return inputs_train_ph
def return_inputs_devel(): return inputs_devel_ph
def return_labels_train(): return labels_train_ph
def return_labels_devel(): return labels_devel_ph
phase_train = tf.placeholder(tf.bool, shape=())
dropout = tf.placeholder(dtype=m_dtype, shape=())
initial_state = tf.placeholder(shape=[None, cell_size], dtype=m_dtype, name="initial_state")
inputs = tf.cond(phase_train, return_inputs_train, return_inputs_devel)
labels = tf.cond(phase_train, return_labels_train, return_labels_devel)
# Graph
def model(inputs):
used = tf.sign(tf.reduce_max(tf.abs(inputs), 2))
length = tf.reduce_sum(used, 1)
length = tf.cast(length, tf.int32)
with tf.variable_scope('layer_cell'):
inputs = tf.layers.conv1d(inputs, filters=100, kernel_size=3, padding="same",
kernel_initializer=tf.glorot_normal_initializer(seed=mseed))
inputs = tf.layers.batch_normalization(inputs, training=phase_train, name="bn")
inputs = tf.nn.relu(inputs)
with tf.variable_scope('lstm_model'):
cell = tf.nn.rnn_cell.GRUCell(cell_size, kernel_initializer=kernel_initializer)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=1.0 - dropout, state_keep_prob=1.0 - dropout)
output, new_state = tf.nn.dynamic_rnn(cell, inputs, dtype=m_dtype, sequence_length=length,
initial_state=initial_state)
with tf.variable_scope("output"):
output = tf.reshape(output, shape=[-1, cell_size])
output = tf.layers.dense(output, units=num_classes,
kernel_initializer=kernel_initializer)
output = tf.reshape(output, shape=[5, -1, num_classes])
used = tf.expand_dims(used, 2)
output = output * used
return output, new_state
output, new_state = model(inputs)
grads_new_state_wrt_vars = tf.gradients(new_state, tf.trainable_variables())
for g in grads_new_state_wrt_vars:
print('**', g)
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
Please note that when I printed out the gradients tensors, I got the following: 请注意,当我打印出梯度张量时,得到以下信息:
for g in grads_new_state_wrt_vars:
print('**', g)
** None
** None
** None
** None
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/MatMul/Enter_grad/b_acc_3:0", shape=(220, 240), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/BiasAdd/Enter_grad/b_acc_3:0", shape=(240,), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/MatMul_1/Enter_grad/b_acc_3:0", shape=(220, 120), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/BiasAdd_1/Enter_grad/b_acc_3:0", shape=(120,), dtype=float64)
** None
** None
Finally, the weights in the network are printed below: 最后,网络中的权重如下所示:
for v in tf.trainable_variables():
print(v.name)
model/conv1d/kernel:0
model/conv1d/bias:0
model/bn/gamma:0
model/bn/beta:0
model/lstm_model/rnn/gru_cell/gates/kernel:0
model/lstm_model/rnn/gru_cell/gates/bias:0
model/lstm_model/rnn/gru_cell/candidate/kernel:0
model/lstm_model/rnn/gru_cell/candidate/bias:0
model/output/dense/kernel:0
model/output/dense/bias:0
Therefore, how come that the gradients can't be computed wrt to the weights of the first conv, and batch norm, layers in the network? 因此,为什么不能计算网络中第一个conv和批处理规范层的权重的梯度呢?
Please note that I don't have the same problem when replacing new_state
by output
in tf.gradients(new_state, tf.trainable_variables())
请注意,用tf.gradients(new_state, tf.trainable_variables())
output
替换new_state
时,我没有相同的问题
Any help is much appreciated!! 任何帮助深表感谢!!
I found out that if I change the None
in the placeholders
defined above, the problem will be solved. 我发现,如果我更改上面定义的placeholders
的None
,则将解决该问题。 And I got the gradients of the new_state
wrt to the conv layers. 我得到了new_state
wrt到conv层的渐变。 This will work as long as the defined batch size is the same in both train and devel placeholders
, ex: 只要在train和devel placeholders
定义的批量大小相同(例如),这将起作用:
inputs_train_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_features], name="inputs_train_ph")
inputs_devel_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_features], name="inputs_devel_ph")
labels_train_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_classes], name="labels_train_ph")
labels_devel_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_classes], name="labels_devel_ph")
Else, I will run into an error again. 否则,我将再次遇到错误。
Please note that the gradients of the output wrt conv layer won't be affected by the None
for the batch size in the placeholders
defined above. 请注意,输出wrt conv图层的梯度不会受到上述placeholders
批处理大小的“ None
”的影响。
Now I would like to know why I am getting this error had I didn't change the None
to batch_size
? 现在,我想知道如果没有将None
更改为batch_size
为什么会出现此错误?
I'm not sure whether this constitutes an exact answer to your question, but it might be a good start as it seems your issue does not get much of an attention (and well, tbh, I'm not surprised with this amount of confusion). 我不确定这是否构成对您问题的确切答案,但这可能是一个不错的开端,因为您的问题似乎并未引起足够的重视(而且,嗯,我对如此多的困惑并不感到惊讶)。
Furthermore there are tons of questions (way too much for comments) others might have as well so I will bring them up here (and try to provide an answer to the situation at hand). 此外,还有其他人可能还有很多问题(评论太多),因此我将在这里提出(并尝试解决当前的情况)。
This problem does not exist for either output
or new_state
when batch_size
is unspecified . 未指定batch_size
时 , output
或new_state
都不存在此问题。
Gradients wrt new_state
, eg grads_new_state_wrt_vars = tf.gradients(new_state, tf.trainable_variables())
return: new_state
渐变,例如grads_new_state_wrt_vars = tf.gradients(new_state, tf.trainable_variables())
返回:
** Tensor("gradients/layer_cell/conv1d/BiasAdd_grad/BiasAddGrad:0", shape=(100,), dtype=float32) ** Tensor("gradients/layer_cell/bn/batchnorm/mul_grad/Mul_1:0", shape=(100,), dtype=float32) ** Tensor("gradients/layer_cell/bn/batchnorm/add_1_grad/Reshape_1:0", shape=(100,), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/MatMul/Enter_grad/b_acc_3:0", shape=(164, 128), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/BiasAdd/Enter_grad/b_acc_3:0", shape=(128,), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/MatMul_1/Enter_grad/b_acc_3:0", shape=(164, 64), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/BiasAdd_1/Enter_grad/b_acc_3:0", shape=(64,), dtype=float32) ** None ** None
as expected (as it doesn't pass through the Dense
part of your network). 符合预期(因为它没有通过网络的Dense
部分)。
Gradients wrt output_state
, eg grads_new_state_wrt_vars = tf.gradients(output_state, tf.trainable_variables())
return: output_state
渐变,例如grads_new_state_wrt_vars = tf.gradients(output_state, tf.trainable_variables())
返回:
** Tensor("gradients/layer_cell/conv1d/BiasAdd_grad/BiasAddGrad:0", shape=(100,), dtype=float32) ** Tensor("gradients/layer_cell/bn/batchnorm/mul_grad/Mul_1:0", shape=(100,), dtype=float32) ** Tensor("gradients/layer_cell/bn/batchnorm/add_1_grad/Reshape_1:0", shape=(100,), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/MatMul/Enter_grad/b_acc_3:0", shape=(164, 128), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/BiasAdd/Enter_grad/b_acc_3:0", shape=(128,), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/MatMul_1/Enter_grad/b_acc_3:0", shape=(164, 64), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/BiasAdd_1/Enter_grad/b_acc_3:0", shape=(64,), dtype=float32) ** Tensor("gradients/output/dense/MatMul_grad/MatMul_1:0", shape=(64, 2), dtype=float32) ** Tensor("gradients/output/dense/BiasAdd_grad/BiasAddGrad:0", shape=(2,), dtype=float32)
Once again, everything is fine. 再一次,一切都很好。
When batch size is specified as you described, eg 如您所述指定批次大小时 ,例如
inputs_train_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_features], name="inputs_train_ph")
inputs_devel_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_features], name="inputs_devel_ph")
your network DOES NOT work at all and it's nothing strange as your shapes do not match in the output namespace. 您的网络完全不工作,这没有什么奇怪,因为你的形状不输出的命名空间相匹配。
Furthermore, labels_devel_ph
and labels_train_ph
does not matter for the code in question, why put them here, they just clutter the question even further. 此外, labels_devel_ph
和labels_train_ph
对所讨论的代码无关紧要,为什么将它们放在此处,它们只会使问题更加混乱。 Please see Minimal, Complete and Verifiable Example , there is a lot of parts totally not needed for the task in question. 请参阅“ 最小,完整和可验证的示例” ,有关任务完全不需要很多部分。
Next, there is no connection between batch shape of inputs_train_ph
and inputs_devel_ph
, why would there be? 接下来, 在 inputs_train_ph
和inputs_devel_ph
批处理形状之间没有连接 ,为什么会有? One is independent of the other and only one is used at a time due to tf.cond
(which has to be provided as a value in session run, but that's beside the scope of this question). 一个独立于另一个,并且由于tf.cond
一次只能使用一个(在会话运行中必须作为值提供,但这tf.cond
此问题的范围之内)。
Problematic output part, exactly this: 有问题的输出部分,正是这样:
output = tf.reshape(output, shape=[5, -1, num_classes])
used = tf.expand_dims(used, 2)
output = output * used
Tensor shapes using your approach: 使用您的方法的张量形状:
Output shape: (5, 480, 2)
Used initial shape: (32, 75)
Used after expand_dims: (32, 75, 1)
Obviously (5, 480, 2)
multiplied by (32, 75, 1)
will not work, I see no possibility of this working ever, even with Tensorflow in pre-alpha version, which makes me think there are other parts of your source code making it working and who knows what else influences it to be honest. 显然(5, 480, 2)
5,480,2 (5, 480, 2)
乘以(32, 75, 1)
32,75,1 (32, 75, 1)
将不起作用,即使使用Tensorflow的pre-alpha版本,我也看不到有这种作用的可能性,这使我认为您的源代码还有其他部分代码使它正常工作,并且谁又知道还有哪些因素会影响它,说实话。
Problem with used
can be solved in multiple ways, but I think what you wanted is to stack used
in another dimension and reshape afterwards ( it will not work without reshaping ): used
问题可以通过多种方式解决,但我认为您想要的是将used
堆栈堆叠到另一个尺寸中,然后进行整形 ( 不重新塑形就无法工作 ):
output = tf.reshape(output, shape=[5, -1, num_classes])
used = tf.stack((used, used), 2)
used = tf.reshape(used, shape=(5, -1, num_classes))
output = output * used
Using this approach, every batch shape passes through network without problem. 使用这种方法, 每个批次的形状都可以毫无问题地通过网络。
BTW. BTW。 I'm not entirely sure what you are trying to achieve with used
at the beginning either, but maybe it has your use case, IMO the intent is totally unreadable when compared to final result (columns containing zeros when all features in sequence are zero and one otherwise). 我也不完全确定您一开始会尝试used
什么来实现,但是也许有您的用例,与最终结果相比,IMO的意图是完全不可读的(当序列中的所有特征均为零时,列包含零,并且否则)。
Tested with version 1.8 of Tensorflow (current is 1.12 and 2.0 is around the corner) and I was able to reproduce your problem. 使用Tensorflow的1.8版进行了测试(当前为1.12,即将到来的2.0), 我能够重现您的问题。
Still, output shapes crash as described above though. 尽管如此,输出形状还是如上所述崩溃了。
Actually, version 1.9 already has this problem solved. 实际上, 1.9版已经解决了这个问题。
I have tried to pin-point possible reason, but I still don't know. 我试图找出可能的原因,但我仍然不知道。 Some ideas: 一些想法:
Using tf.layers
instead of tf.nn
. 使用tf.layers
代替tf.nn
As planned for version 2.0 this module will be deprecated and due to a lot of mess in the framework, I assumed something might be off in this case as well. 按照版本2.0的计划,该模块将被弃用,并且由于框架中的混乱情况,我认为在这种情况下也可能会出现问题。 I have changed conv1d
to it's tf.keras.layers
counterpart and batch normalization to tf.nn.batch_normalization
, to no avail unluckily, results are still the same. 我已经将conv1d
更改为tf.keras.layers
对应项,并将批处理规范化为tf.nn.batch_normalization
,不幸的是,结果仍然相同。
According to 1.9 release notes Prevent tf.gradients() from backpropagating through integer tensors
. 根据1.9版本说明, Prevent tf.gradients() from backpropagating through integer tensors
。 Maybe this has something to do with your problem? 也许这与您的问题有关? Maybe backpropagation graph is somehow stuck in v1.8.0? 也许反向传播图以某种方式卡在v1.8.0中?
Problems with scope tf.variable_scope
and tf.layers
- as expected, nothing changed after removing namespaces. 范围tf.variable_scope
和tf.layers
-如预期的那样,删除名称空间后未发生任何变化。
All in all: update your dependencies to version 1.9 or above, it solves all your problems (though why they occur are rather hard to grasp, as is this framework). 总而言之:将您的依赖项更新到1.9或更高版本,它可以解决所有问题(尽管为什么会发生这些问题,以及该框架也很难理解)。
Actually it's above my head why such changes are between minor versions and not described more in-depth in the changelog, but maybe I'm missing something big... 实际上,在次要版本之间进行这样的更改,而未在变更日志中进行更深入的描述,这是我的头等大事,但也许我错过了一些大事情...
Maybe you should consider using tf.Estimator
, tf.keras
or another framework like PyTorch? 也许您应该考虑使用tf.Estimator
, tf.keras
或其他框架(例如PyTorch)? This code is highly unreadable, hard to debug and ugly, newer practices exist and they should help you and us (if you have another problem with your neural networks and decide to ask on StackOverflow) 该代码非常不易读,难以调试且很丑陋,存在更新的实践,它们应为您和我们提供帮助(如果您的神经网络有其他问题,并决定向StackOverflow提问)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.