[英]How to feed back RNN output to input in tensorflow
In case where suppose I have a trained RNN (eg language model), and I want to see what it would generate on its own, how should I feed its output back to its input? 如果假设我有一个训练有素的RNN(例如语言模型),并且我想看看它自己会产生什么, 我应该如何将其输出反馈给它的输入?
I read the following related questions: 我阅读了以下相关问题:
Theoretically it is clear to me, that in tensorflow we use truncated backpropagation, so we have to define the max step which we would like to "trace". 理论上我很清楚,在tensorflow中我们使用截断的反向传播,所以我们必须定义我们想要“追踪”的最大步骤。 Also we reserve a dimension for batches, therefore if I'd like to train a sine wave, I have to feed
[None, num_step, 1]
inputs. 我们还为批量保留了一个维度,因此如果我想训练一个正弦波,我必须输入
[None, num_step, 1]
输入。
The following code works: 以下代码有效:
tf.reset_default_graph()
n_samples=100
state_size=5
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
X = tf.placeholder_with_default(zero_x, [None, n_samples, 1])
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Y = np.roll(def_x, 1)
loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
opt = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Initial state run
plt.show(plt.plot(output.eval()[0]))
plt.plot(def_x.squeeze())
plt.show(plt.plot(pred.eval().squeeze()))
steps = 1001
for i in range(steps):
p, l, _= sess.run([pred, loss, opt])
The state size of the LSTM can be varied, also I experimented with feeding sine wave into the network and zeros, and in both cases it converged in ~500 iterations. LSTM的状态大小可以变化,我也尝试将正弦波馈入网络和零,并且在两种情况下它都在~500次迭代中收敛。 So far I have understood that in this case the graph consists
n_samples
number of LSTM cells sharing their parameters, and it is only up to me that I feed input to them as a time series . 到目前为止,我已经了解到,在这种情况下,图表包含
n_samples
共享其参数的LSTM单元格数量,我只能将输入作为时间序列提供给它。 However when generating samples the network is explicitly depending on its previous output - meaning that I cannot feed the unrolled model at once. 但是,在生成样本时,网络明确取决于其先前的输出 - 这意味着我无法立即提供展开的模型。 I tried to compute the state and output at every step:
我尝试在每一步计算状态和输出:
with tf.variable_scope('sine', reuse=True):
X_test = tf.placeholder(tf.float64)
X_reshaped = tf.reshape(X_test, [1, -1, 1])
output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
test_vals = [0.]
for i in range(1000):
val = pred.eval({X_test:np.array(test_vals)[None, :, None]})
test_vals.append(val)
However in this model it seems that there is no continuity between the LSTM cells. 然而,在该模型中,似乎LSTM细胞之间没有连续性。 What is going on here?
这里发生了什么?
Do I have to initialize a zero array with ie 100 time steps, and assign each run's result into the array? 我是否必须使用100个时间步骤初始化零数组,并将每个运行的结果分配给数组? Like feeding the network with this:
就像喂网络一样:
run 0: input_feed = [0, 0, 0 ... 0]; res1 = result
运行0:
input_feed = [0, 0, 0 ... 0]; res1 = result
input_feed = [0, 0, 0 ... 0]; res1 = result
run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result
运行1:
input_feed = [res1, 0, 0 ... 0]; res2 = result
input_feed = [res1, 0, 0 ... 0]; res2 = result
run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result
运行1:
input_feed = [res1, res2, 0 ... 0]; res3 = result
input_feed = [res1, res2, 0 ... 0]; res3 = result
etc... 等等...
What to do if I want to use this trained network to use its own output as its input in the following time step? 如果我想使用这个训练有素的网络在下一个时间步骤中使用自己的输出作为输入,该怎么办?
If I understood you correctly, you want to find a way to feed the output of time step t
as input to time step t+1
, right? 如果我理解正确,你想找到一种方法来输出时间步
t
的输出作为时间步t+1
输入,对吧? To do so, there is a relatively easy work around that you can use at test time : 为此,您可以在测试时使用相对简单的工作:
None
. None
。 tf.nn.dynamic_rnn
(which you do in the posted example). tf.nn.dynamic_rnn
(您在发布的示例中执行此操作)。 dynamic_rnn
. dynamic_rnn
。 Ie, change the definition of the model to something like this: 即,将模型的定义更改为以下内容:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
X = tf.placeholder_with_default(zero_x, [None, None, 1]) # [batch_size, seq_length, dimension of input]
batch_size = tf.shape(self.input_)[0]
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64,
initial_state=initial_state)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Then you can perform inference like so: 然后你可以这样执行推理:
fetches = {'final_state': last_state,
'prediction': pred}
toy_initial_input = np.array([[[1]]]) # put suitable data here
seq_length = 20 # put whatever is reasonable here for you
# get the output for the first time step
feed_dict = {X: toy_initial_input}
eval_out = sess.run(fetches, feed_dict)
outputs = [eval_out['prediction']]
next_state = eval_out['final_state']
for i in range(1, seq_length):
feed_dict = {X: outputs[-1],
initial_state: next_state}
eval_out = sess.run(fetches, feed_dict)
outputs.append(eval_out['prediction'])
next_state = eval_out['final_state']
# outputs now contains the sequence you want
Note that this can also work for batches, however it can be a bit more complicated if you sequences of different lengths in the same batch. 请注意,这也适用于批次,但如果您在同一批次中使用不同长度的序列,则可能会更复杂一些。
If you want to perform this kind of prediction not only at test time, but also at training time, it is also possible to do, but a bit more complicated to implement. 如果您不仅要在测试时进行此类预测,还要在训练时进行此类预测,也可以这样做,但实现起来要复杂一些。
You can use its own output (last state) as the next-step input (initial state). 您可以使用自己的输出(最后一个状态)作为下一步输入(初始状态)。 One way to do this is to:
一种方法是:
The second can be done by either: 第二个可以通过以下任一方式完成:
I know I'm a bit late to the party but I think this gist could be useful: 我知道我有点迟到了,但我认为这个要点可能有用:
https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31 https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31
It lets you autofeed the input through a filter and back into the network as input. 它允许您通过过滤器自动输入输入并作为输入返回到网络。 To make shapes match up
processing
can be set as a tf.layers.Dense
layer. 要使形状匹配,可以将
processing
设置为tf.layers.Dense
图层。
Please ask any questions! 请问任何问题!
Edit: 编辑:
In your particular case, create a lambda which performs the processing of the dynamic_rnn
outputs into your character vector space. 在您的特定情况下,创建一个lambda,它将
dynamic_rnn
输出处理到您的字符向量空间中。 Ex: 例如:
# if you have:
W = tf.Variable( ... )
B = tf.Variable( ... )
Yo, Ho = tf.nn.dynamic_rnn( cell , inputs , state )
logits = tf.matmul(W, Yo) + B
...
# use self_feeding_rnn as
process_yo = lambda Yo: tf.matmul(W, Yo) + B
Yo, Ho = self_feeding_rnn( cell, seed, initial_state, processing=process_yo)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.