简体   繁体   English

如何在张量流中反馈RNN输出到输入

[英]How to feed back RNN output to input in tensorflow

In case where suppose I have a trained RNN (eg language model), and I want to see what it would generate on its own, how should I feed its output back to its input? 如果假设我有一个训练有素的RNN(例如语言模型),并且我想看看它自己会产生什么, 我应该如何将其输出反馈给它的输入?

I read the following related questions: 我阅读了以下相关问题:

Theoretically it is clear to me, that in tensorflow we use truncated backpropagation, so we have to define the max step which we would like to "trace". 理论上我很清楚,在tensorflow中我们使用截断的反向传播,所以我们必须定义我们想要“追踪”的最大步骤。 Also we reserve a dimension for batches, therefore if I'd like to train a sine wave, I have to feed [None, num_step, 1] inputs. 我们还为批量保留了一个维度,因此如果我想训练一个正弦波,我必须输入[None, num_step, 1]输入。

The following code works: 以下代码有效:

tf.reset_default_graph()
n_samples=100

state_size=5

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
X = tf.placeholder_with_default(zero_x, [None, n_samples, 1])
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64)

pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)

Y = np.roll(def_x, 1)
loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)


opt = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

# Initial state run
plt.show(plt.plot(output.eval()[0]))
plt.plot(def_x.squeeze())
plt.show(plt.plot(pred.eval().squeeze()))

steps = 1001
for i in range(steps):
    p, l, _= sess.run([pred, loss, opt])

The state size of the LSTM can be varied, also I experimented with feeding sine wave into the network and zeros, and in both cases it converged in ~500 iterations. LSTM的状态大小可以变化,我也尝试将正弦波馈入网络和零,并且在两种情况下它都在~500次迭代中收敛。 So far I have understood that in this case the graph consists n_samples number of LSTM cells sharing their parameters, and it is only up to me that I feed input to them as a time series . 到目前为止,我已经了解到,在这种情况下,图表包含n_samples共享其参数的LSTM单元格数量,我只能将输入作为时间序列提供给它。 However when generating samples the network is explicitly depending on its previous output - meaning that I cannot feed the unrolled model at once. 但是,在生成样本时,网络明确取决于其先前的输出 - 这意味着我无法立即提供展开的模型。 I tried to compute the state and output at every step: 我尝试在每一步计算状态和输出:

with tf.variable_scope('sine', reuse=True):
    X_test = tf.placeholder(tf.float64)
    X_reshaped = tf.reshape(X_test, [1, -1, 1])
    output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64)
    pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)


    test_vals = [0.]
    for i in range(1000):
        val = pred.eval({X_test:np.array(test_vals)[None, :, None]})
        test_vals.append(val)

However in this model it seems that there is no continuity between the LSTM cells. 然而,在该模型中,似乎LSTM细胞之间没有连续性。 What is going on here? 这里发生了什么?

Do I have to initialize a zero array with ie 100 time steps, and assign each run's result into the array? 我是否必须使用100个时间步骤初始化零数组,并将每个运行的结果分配给数组? Like feeding the network with this: 就像喂网络一样:

run 0: input_feed = [0, 0, 0 ... 0]; res1 = result 运行0: input_feed = [0, 0, 0 ... 0]; res1 = result input_feed = [0, 0, 0 ... 0]; res1 = result

run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result 运行1: input_feed = [res1, 0, 0 ... 0]; res2 = result input_feed = [res1, 0, 0 ... 0]; res2 = result

run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result 运行1: input_feed = [res1, res2, 0 ... 0]; res3 = result input_feed = [res1, res2, 0 ... 0]; res3 = result

etc... 等等...

What to do if I want to use this trained network to use its own output as its input in the following time step? 如果我想使用这个训练有素的网络在下一个时间步骤中使用自己的输出作为输入,该怎么办?

If I understood you correctly, you want to find a way to feed the output of time step t as input to time step t+1 , right? 如果我理解正确,你想找到一种方法来输出时间步t的输出作为时间步t+1输入,对吧? To do so, there is a relatively easy work around that you can use at test time : 为此,您可以在测试时使用相对简单的工作:

  1. Make sure your input placeholders can accept a dynamic sequence length, ie the size of the time dimension is None . 确保输入占位符可以接受动态序列长度,即时间维度的大小为None
  2. Make sure you are using tf.nn.dynamic_rnn (which you do in the posted example). 确保您使用的是tf.nn.dynamic_rnn (您在发布的示例中执行此操作)。
  3. Pass the initial state into dynamic_rnn . 将初始状态传递给dynamic_rnn
  4. Then, at test time, you can loop through your sequence and feed each time step individually (ie max sequence length is 1). 然后,在测试时,您可以遍历序列并单独为每个时间步进给(即最大序列长度为1)。 Additionally, you just have to carry over the internal state of the RNN. 此外,您只需要继承RNN的内部状态。 See pseudo code below (the variable names refer to your code snippet). 请参阅下面的伪代码(变量名称引用您的代码段)。

Ie, change the definition of the model to something like this: 即,将模型的定义更改为以下内容:

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
X = tf.placeholder_with_default(zero_x, [None, None, 1])  # [batch_size, seq_length, dimension of input]
batch_size = tf.shape(self.input_)[0]
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64,
    initial_state=initial_state)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)

Then you can perform inference like so: 然后你可以这样执行推理:

fetches = {'final_state': last_state,
           'prediction': pred}

toy_initial_input = np.array([[[1]]])  # put suitable data here
seq_length = 20  # put whatever is reasonable here for you

# get the output for the first time step
feed_dict = {X: toy_initial_input}
eval_out = sess.run(fetches, feed_dict)
outputs = [eval_out['prediction']]
next_state = eval_out['final_state']

for i in range(1, seq_length):
    feed_dict = {X: outputs[-1],
                 initial_state: next_state}
    eval_out = sess.run(fetches, feed_dict)
    outputs.append(eval_out['prediction'])
    next_state = eval_out['final_state']

# outputs now contains the sequence you want

Note that this can also work for batches, however it can be a bit more complicated if you sequences of different lengths in the same batch. 请注意,这也适用于批次,但如果您在同一批次中使用不同长度的序列,则可能会更复杂一些。

If you want to perform this kind of prediction not only at test time, but also at training time, it is also possible to do, but a bit more complicated to implement. 如果您不仅要在测试时进行此类预测,还要在训练时进行此类预测,也可以这样做,但实现起来要复杂一些。

You can use its own output (last state) as the next-step input (initial state). 您可以使用自己的输出(最后一个状态)作为下一步输入(初始状态)。 One way to do this is to: 一种方法是:

  1. use zero-initialized variables as the input state at every time step 在每个时间步使用零初始化变量作为输入状态
  2. each time you completed a truncated sequence and got some output state, update the state variables with this output state you just got. 每次完成截断序列并获得一些输出状态时,请使用刚刚获得的输出状态更新状态变量。

The second can be done by either: 第二个可以通过以下任一方式完成:

  1. fetching the states to python and feeding them back next time, as done in the ptb example in tensorflow/models 将状态提取到python并在下次将它们反馈回来,就像在tensorflow / models中的ptb示例中所做的那样
  2. build an update op in the graph and add a dependency, as done in the ptb example in tensorpack . 在图中构建更新操作并添加依赖关系,如在tensorpack中的ptb示例中所做的那样

I know I'm a bit late to the party but I think this gist could be useful: 我知道我有点迟到了,但我认为这个要点可能有用:

https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31 https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31

It lets you autofeed the input through a filter and back into the network as input. 它允许您通过过滤器自动输入输入并作为输入返回到网络。 To make shapes match up processing can be set as a tf.layers.Dense layer. 要使形状匹配,可以将processing设置为tf.layers.Dense图层。

Please ask any questions! 请问任何问题!

Edit: 编辑:

In your particular case, create a lambda which performs the processing of the dynamic_rnn outputs into your character vector space. 在您的特定情况下,创建一个lambda,它将dynamic_rnn输出处理到您的字符向量空间中。 Ex: 例如:

# if you have:
W = tf.Variable( ... )
B = tf.Variable( ... )
Yo, Ho = tf.nn.dynamic_rnn( cell , inputs , state )
logits = tf.matmul(W, Yo) + B
 ...
# use self_feeding_rnn as
process_yo = lambda Yo: tf.matmul(W, Yo) + B
Yo, Ho = self_feeding_rnn( cell, seed, initial_state, processing=process_yo)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将CNN输出作为RNN中的输入序列进行馈送(使用Tensorflow) - How to feed CNN output as input series in a RNN (using Tensorflow) Pytorch,如何将CNN的output馈入RNN的输入? - Pytorch,How to feed output of CNN into input of RNN? Tensorflow:如何将可变时间步长输入提供给 RNN - Tensorflow: how to feed a variable-time-step input to a RNN 如何在TensorFlow RNN中提供数据? - how to feed data in TensorFlow RNN? 使用 Tensorflow 构建 RNN。 如何正确预处理我的数据集以匹配 RNN 的输入和 output 形状? - Building RNN with Tensorflow. How do I preprocess my dataset correctly to match the RNN's input and output shape? Tensorflow动态RNN(LSTM):如何格式化输入? - Tensorflow dynamic RNN (LSTM): how to format input? 如何在python中使用LSTM将预测值的输出反馈回输入 - How to feed output of predict value back into the input using LSTM in python 如何在 TensorFlow 中使用 RNN LSTM model 塑造和训练多列输入和多列 output(多对多)? - How to shape and train multicolumn input and multicolumn output (many to many) with RNN LSTM model in TensorFlow? Tensorflow RNN输出张量形状 - Tensorflow RNN output tensor shapes Tensorflow RNN用于单输出分类 - Tensorflow RNN for classification with single output
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM