简体   繁体   中英

How can I setup a Dense bottleneck in a stacked LSTM with Keras?

I have:

        self.model.add(Bidirectional(LSTM(lstm1_size, input_shape=(
            seq_length, feature_dim), return_sequences=True)))
        self.model.add(BatchNormalization())
        self.model.add(Dropout(0.2))

        self.model.add(Bidirectional(
            LSTM(lstm2_size, return_sequences=True)))
        self.model.add(BatchNormalization())
        self.model.add(Dropout(0.2))

        # BOTTLENECK HERE

        self.model.add(Bidirectional(
            LSTM(lstm3_size, return_sequences=True)))
        self.model.add(BatchNormalization())
        self.model.add(Dropout(0.2))

        self.model.add(Bidirectional(
            LSTM(lstm4_size, return_sequences=True)))
        self.model.add(BatchNormalization())
        self.model.add(Dropout(0.2))

        self.model.add(Dense(feature_dim, activation='linear'))

However, I want to set up an autoencoder -like setup, without having to have 2 separate models. Where I have the comment BOTTLENECK HERE , I want to have a vector of some dimension, say bottleneck_dim .

After that, it should be some LSTM layers that then reconstruct a sequence, of the same dimensions as the initial input. However, I believe that adding a Dense layer will not return one vector, but instead return vectors for each of the sequence-length?

  • Dense has been updated to automatically act as if wrapped with TimeDistributed - ie you'll get (batch_size, seq_length, lstm2_size) .
  • A workaround is to place a Flatten() before it, so Dense 's output shape will be (batch_size, seq_length * lstm2_size) . I wouldn't recommend it, however, as it's likely to corrupt temporal information (you're mixing channels and timesteps). Further, it constrains the network to seq_length , so you can no longer do training or inference on any other seq_length .

A preferred alternative is Bidirectional(LSTM(..., return_sequences=False)) , which returns only the last timestep's output, shaped (batch_size, lstm_bottleneck_size) . To feed its outputs to the next LSTM, you'll need RepeatVector(seq_length) after the =False layer.

Do mind the extent of the "bottleneck", though; eg if (seq_length, feature_dim) = (200, 64) and lstm_bottleneck_size = 400 , that's (1 * 400) / (200 * 64) = x32 reduction, which is quite large, and may overwhelm the network. I'd suggest with x8 as the goal.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM