Considering this LSTM based RNN:
# Instantiating the model
model = Sequential()
# Input layer
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(30, 1)))
# Hidden layers
model.add(LSTM(12, activation="softsign", return_sequences=True))
model.add(LSTM(12, activation="softsign", return_sequences=True))
# Final Hidden layer
model.add(LSTM(10, activation="softsign"))
# Output layer
model.add(Dense(10))
Is each output unit from the final hidden layer connected to each 12 output unit of the preceding hidden layer? (10*12 = 120 connections)
Is each one of the 10 outputs from the Dense layer connected to each one of the final hidden layer (10*10 = 100 connections)
Would there be a difference in term of connections between the Input layer and the 1st hidden layer if variable "return_sequence" was set to False (for both layers or for one)?
Thanks a lot for your help
Aymeric
Here is how I picture the RNN, please tell me if it's wrong:
Note about the picture:
Let's take a look at the model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 30, 30) 3840
_________________________________________________________________
lstm_1 (LSTM) (None, 30, 12) 2064
_________________________________________________________________
lstm_2 (LSTM) (None, 30, 12) 1200
_________________________________________________________________
lstm_3 (LSTM) (None, 10) 920
_________________________________________________________________
dense (Dense) (None, 10) 110
=================================================================
Total params: 8,134
Trainable params: 8,134
Non-trainable params: 0
_________________________________________________________________
return_sequences=True
, the default is return_sequences=False
, which means only the last output from the final LSTM layer is used by the Dense
layer.return_sequence=True
and return_sequence=False
is that when it is set to false, only the final output will be sent to the next layer. So if I have a time series data with 30 events (1, 30, 30), only the output from the 30th event will be passed along to the next layer. The computations are the same, so there will be no difference in weights. Do know that there might be some shape mis-matches if you try to set some of these to be False
out of the box.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.