简体   繁体   中英

How to feed back RNN output to input in tensorflow

In case where suppose I have a trained RNN (eg language model), and I want to see what it would generate on its own, how should I feed its output back to its input?

I read the following related questions:

Theoretically it is clear to me, that in tensorflow we use truncated backpropagation, so we have to define the max step which we would like to "trace". Also we reserve a dimension for batches, therefore if I'd like to train a sine wave, I have to feed [None, num_step, 1] inputs.

The following code works:

tf.reset_default_graph()
n_samples=100

state_size=5

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
X = tf.placeholder_with_default(zero_x, [None, n_samples, 1])
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64)

pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)

Y = np.roll(def_x, 1)
loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)


opt = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

# Initial state run
plt.show(plt.plot(output.eval()[0]))
plt.plot(def_x.squeeze())
plt.show(plt.plot(pred.eval().squeeze()))

steps = 1001
for i in range(steps):
    p, l, _= sess.run([pred, loss, opt])

The state size of the LSTM can be varied, also I experimented with feeding sine wave into the network and zeros, and in both cases it converged in ~500 iterations. So far I have understood that in this case the graph consists n_samples number of LSTM cells sharing their parameters, and it is only up to me that I feed input to them as a time series . However when generating samples the network is explicitly depending on its previous output - meaning that I cannot feed the unrolled model at once. I tried to compute the state and output at every step:

with tf.variable_scope('sine', reuse=True):
    X_test = tf.placeholder(tf.float64)
    X_reshaped = tf.reshape(X_test, [1, -1, 1])
    output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64)
    pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)


    test_vals = [0.]
    for i in range(1000):
        val = pred.eval({X_test:np.array(test_vals)[None, :, None]})
        test_vals.append(val)

However in this model it seems that there is no continuity between the LSTM cells. What is going on here?

Do I have to initialize a zero array with ie 100 time steps, and assign each run's result into the array? Like feeding the network with this:

run 0: input_feed = [0, 0, 0 ... 0]; res1 = result input_feed = [0, 0, 0 ... 0]; res1 = result

run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result input_feed = [res1, 0, 0 ... 0]; res2 = result

run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result input_feed = [res1, res2, 0 ... 0]; res3 = result

etc...

What to do if I want to use this trained network to use its own output as its input in the following time step?

If I understood you correctly, you want to find a way to feed the output of time step t as input to time step t+1 , right? To do so, there is a relatively easy work around that you can use at test time :

  1. Make sure your input placeholders can accept a dynamic sequence length, ie the size of the time dimension is None .
  2. Make sure you are using tf.nn.dynamic_rnn (which you do in the posted example).
  3. Pass the initial state into dynamic_rnn .
  4. Then, at test time, you can loop through your sequence and feed each time step individually (ie max sequence length is 1). Additionally, you just have to carry over the internal state of the RNN. See pseudo code below (the variable names refer to your code snippet).

Ie, change the definition of the model to something like this:

lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
X = tf.placeholder_with_default(zero_x, [None, None, 1])  # [batch_size, seq_length, dimension of input]
batch_size = tf.shape(self.input_)[0]
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64,
    initial_state=initial_state)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)

Then you can perform inference like so:

fetches = {'final_state': last_state,
           'prediction': pred}

toy_initial_input = np.array([[[1]]])  # put suitable data here
seq_length = 20  # put whatever is reasonable here for you

# get the output for the first time step
feed_dict = {X: toy_initial_input}
eval_out = sess.run(fetches, feed_dict)
outputs = [eval_out['prediction']]
next_state = eval_out['final_state']

for i in range(1, seq_length):
    feed_dict = {X: outputs[-1],
                 initial_state: next_state}
    eval_out = sess.run(fetches, feed_dict)
    outputs.append(eval_out['prediction'])
    next_state = eval_out['final_state']

# outputs now contains the sequence you want

Note that this can also work for batches, however it can be a bit more complicated if you sequences of different lengths in the same batch.

If you want to perform this kind of prediction not only at test time, but also at training time, it is also possible to do, but a bit more complicated to implement.

You can use its own output (last state) as the next-step input (initial state). One way to do this is to:

  1. use zero-initialized variables as the input state at every time step
  2. each time you completed a truncated sequence and got some output state, update the state variables with this output state you just got.

The second can be done by either:

  1. fetching the states to python and feeding them back next time, as done in the ptb example in tensorflow/models
  2. build an update op in the graph and add a dependency, as done in the ptb example in tensorpack .

I know I'm a bit late to the party but I think this gist could be useful:

https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31

It lets you autofeed the input through a filter and back into the network as input. To make shapes match up processing can be set as a tf.layers.Dense layer.

Please ask any questions!

Edit:

In your particular case, create a lambda which performs the processing of the dynamic_rnn outputs into your character vector space. Ex:

# if you have:
W = tf.Variable( ... )
B = tf.Variable( ... )
Yo, Ho = tf.nn.dynamic_rnn( cell , inputs , state )
logits = tf.matmul(W, Yo) + B
 ...
# use self_feeding_rnn as
process_yo = lambda Yo: tf.matmul(W, Yo) + B
Yo, Ho = self_feeding_rnn( cell, seed, initial_state, processing=process_yo)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM