简体   繁体   中英

Use hidden states instead of outputs in LSTMs of keras

I want to use an implementation of an attention mechanism by Yang et al. . I found a working implementation of a custom layer that uses this attention machanism here . Instead of using the output values of my LSTM:

my_lstm = LSTM(128, input_shape=(a, b), return_sequences=True)
my_lstm = AttentionWithContext()(my_lstm)
out = Dense(2, activation='softmax')(my_lstm)

I would like to use the hidden states of the LSTM:

my_lstm = LSTM(128, input_shape=(a, b), return_state=True)
my_lstm = AttentionWithContext()(my_lstm)
out = Dense(2, activation='softmax')(my_lstm)

But I get the error:

TypeError: can only concatenate tuple (not "int") to tuple

I tried it in combination with return_sequences but everything I've tried failed so far. How can I modify the returning tensors in order to use it like the returned output sequences?

Thanks!

I think your confusion possibly stems from the Keras documentation being a little unclear.

return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
return_state: Boolean. Whether to return the last state in addition to the output.

The docs on return_state are especially confusing because they imply that the hidden states are different from the outputs, but they are one in the same. For LSTMs this gets a little murky because in addition to the hidden (output) states, there is the cell state. We can confirm this by looking at the LSTM step function in the Keras code:

class LSTM(Recurrent):
    def step(...):
        ...
        return h, [h, c]

The return type of this step function is output, states . So we can see that the hidden state h is actually the output, and for the states we get both the hidden state h and the cell state c . This is why you see the Wiki article you linked using the terms "hidden" and "output" interchangeably.

Looking at the paper you linked a little closer, it seems to me your original implementation is what you want.

my_lstm = LSTM(128, input_shape=(a, b), return_sequences=True)
my_lstm = AttentionWithContext()(my_lstm)
out = Dense(2, activation='softmax')(my_lstm)

This will pass the hidden state at each timestep to your attention layer. The only scenario where you are out of luck is the one where you actually want to pass the cell state from each timestep to your attention layer (which is what I thought initially), but I do not think this is what you want. The paper you linked actually uses a GRU layer, which has no concept of a cell state, and whose step function also returns the hidden state as the output.

class GRU(Recurrent):
    def step(...):
        ...
        return h, [h]

So the paper is almost certainly referring to the hidden states (aka outputs) and not the cell states.

Just to add one point to the Nicole's answer -

If we use combination of return_state = True and return_sequences = True in LSTM then the first [h] will return the hidden state aka output at each time step (vector) whereas 2nd [h] will return hidden state at last time step (scalar).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM