Any idea on how to implement the attached LSTM RNN architecture in tensorflow?

Question

I wanted to used this example and extend it to implement the architecture in the following figure. The code uses BasicLSTMCell and tf.contrib.rnn.BasicLSTMCell in the following way:

    lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden)
    outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x, dtype=tf.float32,sequence_length=seqlen)

I printed "states" (and outputs) and I expected the "states" to have the shape [number of input sequences, x] where x is the lengths of each input sequence. BUT, when I print "states" (or "outputs") both of them have the shape [number of input sequences, n_hidden ] where n_hidden is the hidden layer number of features.

First of all, Am I printing the hidden states for just one time step (maybe the last time step) and not the unrolled RNN ?? How can I print all the hidden states after RNN processes each time step of the input sequence then (to make sure I am implementing the following architecture)?

Second, how would you implement the following architecture in tensorflow though? Suppose each xi is a 12-bit binary vector and each input sequence includes at most 80 vectors. Each input sequence is paired with an out put sequence and the goal is to predict these out put sequences by looking at their relevant input sequences.

Answer 1

That's fishy.

The return from static_rnn is supposed to be all the outputs and the final state ( link ).

So looking at outputs it should be a list of len seqlen with each entry a batch X n_hidden.

Use tf.nn.static_state_saving_rnn to save all the intermediate state, then print them like any other tensor in your model.

For the architecture question.

If you are supposed to return an output after every entry, then take the outputs and apply a loss to make them look like the labels you have.

If you are supposed to look at the entire sequence and then propose outputs, then you need two rnn system. You should have an encoding rnn, like the one you already have. We ignore the outputs for this part. Then you take the final state, and feed it to a decoding rnn. This one doesn't have any input. We take the outputs of the decoding rnn and apply a loss to make them look like the labels.

As always you should try various setups, different size for the layers, different number of layers, etc... And pick the best setup.

Any idea on how to implement the attached LSTM RNN architecture in tensorflow?

Question

1 answers

solution1
1 2018-09-10 07:44:06

Any idea on how to implement the attached LSTM RNN architecture in tensorflow?

Question

1 answers

solution1 1 2018-09-10 07:44:06

solution1
1 2018-09-10 07:44:06