在张量流中计算新状态（RNN）相对于模型参数（包括用于输入的CNN）的梯度； tf.gradient返回无

Question

Summary: 摘要：

I have a 1d CNN that extracts features from the inputs. 我有一个1D CNN，可从输入中提取特征。 The CNN is then follows by an RNN. 然后，CNN之后是RNN。 I am looking for a way to back propagate gradients from the new_state of the RNN to the CNN parameters. 我正在寻找一种方法来将梯度从RNN的new_state传播回CNN参数。 Also, we can consider a conv layer with kernel size [1, 1, input_num_features, output_num_features] . 另外，我们可以考虑一个具有内核大小[1, 1, input_num_features, output_num_features]的转换层。 The code is below: 代码如下：

import tensorflow as tf 将tensorflow作为tf导入

mseed = 123
tf.set_random_seed(mseed)
kernel_initializer = tf.glorot_normal_initializer(seed=mseed)

# Graph Hyperparameters
cell_size = 64
num_classes = 2
m_dtype = tf.float32
num_features = 30

inputs_train_ph = tf.placeholder(dtype=m_dtype, shape=[None, 75, num_features], name="inputs_train_ph")
inputs_devel_ph = tf.placeholder(dtype=m_dtype, shape=[None, 75, num_features], name="inputs_devel_ph")

labels_train_ph = tf.placeholder(dtype=m_dtype, shape=[None, 75, num_classes], name="labels_train_ph")
labels_devel_ph = tf.placeholder(dtype=m_dtype, shape=[None, 75, num_classes], name="labels_devel_ph")

def return_inputs_train(): return inputs_train_ph
def return_inputs_devel(): return inputs_devel_ph
def return_labels_train(): return labels_train_ph
def return_labels_devel(): return labels_devel_ph

phase_train = tf.placeholder(tf.bool, shape=())
dropout = tf.placeholder(dtype=m_dtype, shape=())
initial_state = tf.placeholder(shape=[None, cell_size], dtype=m_dtype, name="initial_state")

inputs = tf.cond(phase_train, return_inputs_train, return_inputs_devel)
labels = tf.cond(phase_train, return_labels_train, return_labels_devel)

# Graph
def model(inputs):
    used = tf.sign(tf.reduce_max(tf.abs(inputs), 2))

    length = tf.reduce_sum(used, 1)
    length = tf.cast(length, tf.int32)

    with tf.variable_scope('layer_cell'):
        inputs = tf.layers.conv1d(inputs, filters=100, kernel_size=3, padding="same",
                                  kernel_initializer=tf.glorot_normal_initializer(seed=mseed))
        inputs = tf.layers.batch_normalization(inputs, training=phase_train, name="bn")
        inputs = tf.nn.relu(inputs)

    with tf.variable_scope('lstm_model'):

        cell = tf.nn.rnn_cell.GRUCell(cell_size, kernel_initializer=kernel_initializer)
        cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=1.0 - dropout, state_keep_prob=1.0 - dropout)
        output, new_state = tf.nn.dynamic_rnn(cell, inputs, dtype=m_dtype, sequence_length=length,
                                              initial_state=initial_state)

    with tf.variable_scope("output"):
        output = tf.reshape(output, shape=[-1, cell_size])
        output = tf.layers.dense(output, units=num_classes,
                                 kernel_initializer=kernel_initializer)

        output = tf.reshape(output, shape=[5, -1, num_classes])
        used = tf.expand_dims(used, 2)

        output = output * used

    return output, new_state


output, new_state = model(inputs)

grads_new_state_wrt_vars = tf.gradients(new_state, tf.trainable_variables())
for g in grads_new_state_wrt_vars:
    print('**', g)

init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init_op)

Please note that when I printed out the gradients tensors, I got the following: 请注意，当我打印出梯度张量时，得到以下信息：

for g in grads_new_state_wrt_vars:
    print('**', g)

** None
** None
** None
** None
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/MatMul/Enter_grad/b_acc_3:0", shape=(220, 240), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/BiasAdd/Enter_grad/b_acc_3:0", shape=(240,), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/MatMul_1/Enter_grad/b_acc_3:0", shape=(220, 120), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/BiasAdd_1/Enter_grad/b_acc_3:0", shape=(120,), dtype=float64)
** None
** None

Finally, the weights in the network are printed below: 最后，网络中的权重如下所示：

for v in tf.trainable_variables():
    print(v.name)

model/conv1d/kernel:0
model/conv1d/bias:0
model/bn/gamma:0
model/bn/beta:0
model/lstm_model/rnn/gru_cell/gates/kernel:0
model/lstm_model/rnn/gru_cell/gates/bias:0
model/lstm_model/rnn/gru_cell/candidate/kernel:0
model/lstm_model/rnn/gru_cell/candidate/bias:0
model/output/dense/kernel:0
model/output/dense/bias:0

Therefore, how come that the gradients can't be computed wrt to the weights of the first conv, and batch norm, layers in the network? 因此，为什么不能计算网络中第一个conv和批处理规范层的权重的梯度呢？

Please note that I don't have the same problem when replacing new_state by output in tf.gradients(new_state, tf.trainable_variables()) 请注意，用tf.gradients(new_state, tf.trainable_variables()) output替换new_state时，我没有相同的问题

Any help is much appreciated!! 任何帮助深表感谢！！

Edit 编辑

I found out that if I change the None in the placeholders defined above, the problem will be solved. 我发现，如果我更改上面定义的placeholders的None ，则将解决该问题。 And I got the gradients of the new_state wrt to the conv layers. 我得到了new_state wrt到conv层的渐变。 This will work as long as the defined batch size is the same in both train and devel placeholders , ex: 只要在train和devel placeholders定义的批量大小相同（例如），这将起作用：

inputs_train_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_features], name="inputs_train_ph")
inputs_devel_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_features], name="inputs_devel_ph")

labels_train_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_classes], name="labels_train_ph")
labels_devel_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_classes], name="labels_devel_ph")

Else, I will run into an error again. 否则，我将再次遇到错误。

Please note that the gradients of the output wrt conv layer won't be affected by the None for the batch size in the placeholders defined above. 请注意，输出wrt conv图层的梯度不会受到上述placeholders批处理大小的“ None ”的影响。

Now I would like to know why I am getting this error had I didn't change the None to batch_size ? 现在，我想知道如果没有将None更改为batch_size为什么会出现此错误？

Answer 1

I'm not sure whether this constitutes an exact answer to your question, but it might be a good start as it seems your issue does not get much of an attention (and well, tbh, I'm not surprised with this amount of confusion). 我不确定这是否构成对您问题的确切答案，但这可能是一个不错的开端，因为您的问题似乎并未引起足够的重视（而且，嗯，我对如此多的困惑并不感到惊讶）。

Furthermore there are tons of questions (way too much for comments) others might have as well so I will bring them up here (and try to provide an answer to the situation at hand). 此外，还有其他人可能还有很多问题（评论太多），因此我将在这里提出（并尝试解决当前的情况）。

Tensorflow version: 1.12 Tensorflow版本： 1.12

1. None gradients 1.无渐变

This problem does not exist for either output or new_state when batch_size is unspecified . 未指定batch_size时 ， output或new_state都不存在此问题。

Gradients wrt new_state , eg grads_new_state_wrt_vars = tf.gradients(new_state, tf.trainable_variables()) return: new_state渐变，例如grads_new_state_wrt_vars = tf.gradients(new_state, tf.trainable_variables())返回：

 ** Tensor("gradients/layer_cell/conv1d/BiasAdd_grad/BiasAddGrad:0", shape=(100,), dtype=float32) ** Tensor("gradients/layer_cell/bn/batchnorm/mul_grad/Mul_1:0", shape=(100,), dtype=float32) ** Tensor("gradients/layer_cell/bn/batchnorm/add_1_grad/Reshape_1:0", shape=(100,), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/MatMul/Enter_grad/b_acc_3:0", shape=(164, 128), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/BiasAdd/Enter_grad/b_acc_3:0", shape=(128,), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/MatMul_1/Enter_grad/b_acc_3:0", shape=(164, 64), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/BiasAdd_1/Enter_grad/b_acc_3:0", shape=(64,), dtype=float32) ** None ** None

as expected (as it doesn't pass through the Dense part of your network). 符合预期（因为它没有通过网络的Dense部分）。

Gradients wrt output_state , eg grads_new_state_wrt_vars = tf.gradients(output_state, tf.trainable_variables()) return: output_state渐变，例如grads_new_state_wrt_vars = tf.gradients(output_state, tf.trainable_variables())返回：

 ** Tensor("gradients/layer_cell/conv1d/BiasAdd_grad/BiasAddGrad:0", shape=(100,), dtype=float32) ** Tensor("gradients/layer_cell/bn/batchnorm/mul_grad/Mul_1:0", shape=(100,), dtype=float32) ** Tensor("gradients/layer_cell/bn/batchnorm/add_1_grad/Reshape_1:0", shape=(100,), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/MatMul/Enter_grad/b_acc_3:0", shape=(164, 128), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/BiasAdd/Enter_grad/b_acc_3:0", shape=(128,), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/MatMul_1/Enter_grad/b_acc_3:0", shape=(164, 64), dtype=float32) ** Tensor("gradients/lstm_model/rnn/while/gru_cell/BiasAdd_1/Enter_grad/b_acc_3:0", shape=(64,), dtype=float32) ** Tensor("gradients/output/dense/MatMul_grad/MatMul_1:0", shape=(64, 2), dtype=float32) ** Tensor("gradients/output/dense/BiasAdd_grad/BiasAddGrad:0", shape=(2,), dtype=float32)

Once again, everything is fine. 再一次，一切都很好。

2. Specifying batch_size 2.指定batch_size

When batch size is specified as you described, eg 如您所述指定批次大小时 ，例如

inputs_train_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_features], name="inputs_train_ph")
inputs_devel_ph = tf.placeholder(dtype=m_dtype, shape=[34, 75, num_features], name="inputs_devel_ph")

your network DOES NOT work at all and it's nothing strange as your shapes do not match in the output namespace. 您的网络完全不工作，这没有什么奇怪，因为你的形状不输出的命名空间相匹配。

Furthermore, labels_devel_ph and labels_train_ph does not matter for the code in question, why put them here, they just clutter the question even further. 此外， labels_devel_ph和labels_train_ph对所讨论的代码无关紧要，为什么将它们放在此处，它们只会使问题更加混乱。 Please see Minimal, Complete and Verifiable Example , there is a lot of parts totally not needed for the task in question. 请参阅“ 最小，完整和可验证的示例” ，有关任务完全不需要很多部分。

Next, there is no connection between batch shape of inputs_train_ph and inputs_devel_ph , why would there be? 接下来，在 inputs_train_ph和inputs_devel_ph批处理形状之间没有连接 ，为什么会有？ One is independent of the other and only one is used at a time due to tf.cond (which has to be provided as a value in session run, but that's beside the scope of this question). 一个独立于另一个，并且由于tf.cond一次只能使用一个（在会话运行中必须作为值提供，但这tf.cond此问题的范围之内）。

Problematic output part, exactly this: 有问题的输出部分，正是这样：

output = tf.reshape(output, shape=[5, -1, num_classes])
used = tf.expand_dims(used, 2)

output = output * used

Tensor shapes using your approach: 使用您的方法的张量形状：

Output shape: (5, 480, 2)                                                                                                                          
Used initial shape: (32, 75) 
Used after expand_dims: (32, 75, 1)

Obviously (5, 480, 2) multiplied by (32, 75, 1) will not work, I see no possibility of this working ever, even with Tensorflow in pre-alpha version, which makes me think there are other parts of your source code making it working and who knows what else influences it to be honest. 显然(5, 480, 2) 5，480，2 (5, 480, 2)乘以(32, 75, 1) 32，75，1 (32, 75, 1)将不起作用，即使使用Tensorflow的pre-alpha版本，我也看不到有这种作用的可能性，这使我认为您的源代码还有其他部分代码使它正常工作，并且谁又知道还有哪些因素会影响它，说实话。

Problem with used can be solved in multiple ways, but I think what you wanted is to stack used in another dimension and reshape afterwards ( it will not work without reshaping ): used问题可以通过多种方式解决，但我认为您想要的是将used堆栈堆叠到另一个尺寸中，然后进行整形 （ 不重新塑形就无法工作 ）：

output = tf.reshape(output, shape=[5, -1, num_classes])
used = tf.stack((used, used), 2)
used = tf.reshape(used, shape=(5, -1, num_classes))

output = output * used

Using this approach, every batch shape passes through network without problem. 使用这种方法， 每个批次的形状都可以毫无问题地通过网络。

BTW. BTW。 I'm not entirely sure what you are trying to achieve with used at the beginning either, but maybe it has your use case, IMO the intent is totally unreadable when compared to final result (columns containing zeros when all features in sequence are zero and one otherwise). 我也不完全确定您一开始会尝试used什么来实现，但是也许有您的用例，与最终结果相比，IMO的意图是完全不可读的（当序列中的所有特征均为零时，列包含零，并且否则）。

3. Tensorflow versions 3. Tensorflow版本

Tested with version 1.8 of Tensorflow (current is 1.12 and 2.0 is around the corner) and I was able to reproduce your problem. 使用Tensorflow的1.8版进行了测试（当前为1.12，即将到来的2.0）， 我能够重现您的问题。

Still, output shapes crash as described above though. 尽管如此，输出形状还是如上所述崩溃了。

Actually, version 1.9 already has this problem solved. 实际上， 1.9版已经解决了这个问题。

Why this problem exists in 1.8? 为什么1.8中存在此问题？

I have tried to pin-point possible reason, but I still don't know. 我试图找出可能的原因，但我仍然不知道。 Some ideas: 一些想法：

Using tf.layers instead of tf.nn . 使用tf.layers代替tf.nn As planned for version 2.0 this module will be deprecated and due to a lot of mess in the framework, I assumed something might be off in this case as well. 按照版本2.0的计划，该模块将被弃用，并且由于框架中的混乱情况，我认为在这种情况下也可能会出现问题。 I have changed conv1d to it's tf.keras.layers counterpart and batch normalization to tf.nn.batch_normalization , to no avail unluckily, results are still the same. 我已经将conv1d更改为tf.keras.layers对应项，并将批处理规范化为tf.nn.batch_normalization ，不幸的是，结果仍然相同。
According to 1.9 release notes Prevent tf.gradients() from backpropagating through integer tensors . 根据1.9版本说明， Prevent tf.gradients() from backpropagating through integer tensors 。 Maybe this has something to do with your problem? 也许这与您的问题有关？ Maybe backpropagation graph is somehow stuck in v1.8.0? 也许反向传播图以某种方式卡在v1.8.0中？
Problems with scope tf.variable_scope and tf.layers - as expected, nothing changed after removing namespaces. 范围tf.variable_scope和tf.layers -如预期的那样，删除名称空间后未发生任何变化。

All in all: update your dependencies to version 1.9 or above, it solves all your problems (though why they occur are rather hard to grasp, as is this framework). 总而言之：将您的依赖项更新到1.9或更高版本，它可以解决所有问题（尽管为什么会发生这些问题，以及该框架也很难理解）。

Actually it's above my head why such changes are between minor versions and not described more in-depth in the changelog, but maybe I'm missing something big... 实际上，在次要版本之间进行这样的更改，而未在变更日志中进行更深入的描述，这是我的头等大事，但也许我错过了一些大事情...

Little off-topic 有点题外话

Maybe you should consider using tf.Estimator , tf.keras or another framework like PyTorch? 也许您应该考虑使用tf.Estimator ， tf.keras或其他框架（例如PyTorch）？ This code is highly unreadable, hard to debug and ugly, newer practices exist and they should help you and us (if you have another problem with your neural networks and decide to ask on StackOverflow) 该代码非常不易读，难以调试且很丑陋，存在更新的实践，它们应为您和我们提供帮助（如果您的神经网络有其他问题，并决定向StackOverflow提问）

在张量流中计算新状态（RNN）相对于模型参数（包括用于输入的CNN）的梯度； tf.gradient返回无

问题描述

Edit 编辑

1 个解决方案

解决方案1
1 已采纳 2019-02-12 14:58:59

Tensorflow version: 1.12 Tensorflow版本： 1.12

1. None gradients 1.无渐变

2. Specifying batch_size 2.指定batch_size

3. Tensorflow versions 3. Tensorflow版本

Why this problem exists in 1.8? 为什么1.8中存在此问题？

Little off-topic 有点题外话

在张量流中计算新状态（RNN）相对于模型参数（包括用于输入的CNN）的梯度； tf.gradient返回无

问题描述

Edit 编辑

1 个解决方案

解决方案1 1 已采纳 2019-02-12 14:58:59

Tensorflow version: 1.12 Tensorflow版本： 1.12

1. None gradients 1.无渐变

2. Specifying batch_size 2.指定batch_size

3. Tensorflow versions 3. Tensorflow版本

Why this problem exists in 1.8? 为什么1.8中存在此问题？

Little off-topic 有点题外话

解决方案1
1 已采纳 2019-02-12 14:58:59