简体   繁体   中英

How to Feed in the output of one LSTM along with Text into another LSTM in Tensorflow?

I am trying to feed in the output of one LSTM layer into another LSTM layer, along with the text included for that layer. The text provided to the two LSTM's is different, and my goal is that the second LSTM improves its understanding of it's text based on what the first LSTM understood.

I can try to implement it in Tensorflow like this:

# text inputs to the two LSTM's
rnn_inputs = tf.nn.embedding_lookup(embeddings, text_data)
rnn_inputs_2 = tf.nn.embedding_lookup(embeddings, text_data)
# first LSTM
lstm1Output, lstm1State = tf.nn.dynamic_rnn(cell=lstm1, 
        inputs=rnn_inputs, 
        sequence_length=input_lengths, 
        dtype=tf.float32, 
        time_major=False)
# second LSTM
lstm2Output, lstm2State = tf.nn.dynamic_rnn(cell=lstm2, 
        # use the input of the second LSTM and the first LSTM here
        inputs=rnn_inputs_2 + lstm1State, 
        sequence_length=input_lengths_2, 
        dtype=tf.float32, 
        time_major=False)

This has an issue, since rnn_inputs_2 size is of (batch_size, _, hidden_layer_size) , while lstm1State size is of (batch_size, hidden_layer_size) . Does anyone have an idea of how I can change the shapes to make this work, or if there is some better way?

Thanks

You're interpreting the hidden state of LSTM1 as a sentence embedding (rightfully so). And you now want to pass that sentence embedding into LSTM2 as prior knowledge it can base its decisions on.

If I described that correctly then you seem to be describing an encoder/decoder model, with the addition of new inputs to LSTM2. If that's accurate, then my first approach would be to pass the hidden state of LSTM1 in as the initial state of LSTM2. That would be far more logical than adding it to the input of each LSTM2 time step.

You would have the further benefit of having an extra gradient path passing from LSTM2 through the state of LSTM1 back to LSTM1, so you would be training LSTM1 on not only the loss function for LSTM1, but also on its ability to provide something that LSTM2 can use to improve its loss function (assuming you train both LSTM 1&2 in the same sess.run iteration).

With respect to the question:

Another question, what if I wanted to introduce a LSTM3 who's output should also effect LSTM2. In this case, would I just sum LSTM3 and LSTM1 hidden state and set that as the initial state for LSTM2?

Summing sounds bad, concatenating sounds good. You control the hidden state size of LSTM2, it should just have a larger hidden state size.

And with respect to this question:

One of the things I didn't mention earlier was that sometimes LSTM1 will receive no input, and obviously since it's input is a sentence, LSTM1 will receive different input every time. Would this impact the error updates for LSTM1 and LSTM2? Also, this would mean that I can't use an encoder-decoder system, right? Otherwise what you are saying makes sense, I am running it now and will see if it helps my performance

In this case, if LSTM1 has no input (and thus no output state), I think the logical solution is to initialize LSTM2 with a standard hidden state vector of all zeros. This is what dynamic_rnn is doing under the hood if you don't give it an initial hidden state, so it's equivalent if you explicitly pass it a vector of 0's.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM