简体   繁体   中英

Simple questions about LSTM networks from Keras

Considering this LSTM based RNN:

# Instantiating the model
model = Sequential()

# Input layer
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(30, 1)))

# Hidden layers
model.add(LSTM(12, activation="softsign", return_sequences=True))
model.add(LSTM(12, activation="softsign", return_sequences=True))

# Final Hidden layer
model.add(LSTM(10, activation="softsign"))

# Output layer
model.add(Dense(10))
  1. Is each output unit from the final hidden layer connected to each 12 output unit of the preceding hidden layer? (10*12 = 120 connections)

  2. Is each one of the 10 outputs from the Dense layer connected to each one of the final hidden layer (10*10 = 100 connections)

  3. Would there be a difference in term of connections between the Input layer and the 1st hidden layer if variable "return_sequence" was set to False (for both layers or for one)?

Thanks a lot for your help

Aymeric

Here is how I picture the RNN, please tell me if it's wrong:

在此处输入图像描述

Note about the picture:

  • X = one training example, ie a vector of 30 bitcoin (BTC) values (each value represent one day, 30 days total)
  • Output vector = 10 values that are supposed to be the 10 next values of bitcoin (10 next days)

Let's take a look at the model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 30, 30)            3840      
_________________________________________________________________
lstm_1 (LSTM)                (None, 30, 12)            2064      
_________________________________________________________________
lstm_2 (LSTM)                (None, 30, 12)            1200      
_________________________________________________________________
lstm_3 (LSTM)                (None, 10)                920       
_________________________________________________________________
dense (Dense)                (None, 10)                110       
=================================================================
Total params: 8,134
Trainable params: 8,134
Non-trainable params: 0
_________________________________________________________________
  1. Since you don't use return_sequences=True , the default is return_sequences=False , which means only the last output from the final LSTM layer is used by the Dense layer.
  2. Yes. But it is actually 110 because you have a bias: (10 + 1) * 10.
  3. There would not. The difference between return_sequence=True and return_sequence=False is that when it is set to false, only the final output will be sent to the next layer. So if I have a time series data with 30 events (1, 30, 30), only the output from the 30th event will be passed along to the next layer. The computations are the same, so there will be no difference in weights. Do know that there might be some shape mis-matches if you try to set some of these to be False out of the box.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM