简体   繁体   English

Keras 关于 LSTM 网络的简单问题

[英]Simple questions about LSTM networks from Keras

Considering this LSTM based RNN:考虑到这个基于 LSTM 的 RNN:

# Instantiating the model
model = Sequential()

# Input layer
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(30, 1)))

# Hidden layers
model.add(LSTM(12, activation="softsign", return_sequences=True))
model.add(LSTM(12, activation="softsign", return_sequences=True))

# Final Hidden layer
model.add(LSTM(10, activation="softsign"))

# Output layer
model.add(Dense(10))
  1. Is each output unit from the final hidden layer connected to each 12 output unit of the preceding hidden layer?来自最终隐藏层的每个 output 单元是否连接到前一个隐藏层的每个 12 个 output 单元? (10*12 = 120 connections) (10*12 = 120 个连接)

  2. Is each one of the 10 outputs from the Dense layer connected to each one of the final hidden layer (10*10 = 100 connections)密集层的 10 个输出中的每一个是否都连接到最终隐藏层的每一个(10*10 = 100 个连接)

  3. Would there be a difference in term of connections between the Input layer and the 1st hidden layer if variable "return_sequence" was set to False (for both layers or for one)?如果变量“return_sequence”设置为 False(对于两个层或一个),输入层和第一个隐藏层之间的连接是否会有所不同?

Thanks a lot for your help非常感谢你的帮助

Aymeric艾美瑞克

Here is how I picture the RNN, please tell me if it's wrong:这是我对 RNN 的描绘,如果错了,请告诉我:

在此处输入图像描述

Note about the picture:关于图片的注意事项:

  • X = one training example, ie a vector of 30 bitcoin (BTC) values (each value represent one day, 30 days total) X = 一个训练示例,即 30 个比特币 (BTC) 值的向量(每个值代表一天,总共 30 天)
  • Output vector = 10 values that are supposed to be the 10 next values of bitcoin (10 next days) Output 向量 = 10 个值,应该是比特币的 10 个下一个值(接下来的 10 个)

Let's take a look at the model summary:先来看看model总结:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 30, 30)            3840      
_________________________________________________________________
lstm_1 (LSTM)                (None, 30, 12)            2064      
_________________________________________________________________
lstm_2 (LSTM)                (None, 30, 12)            1200      
_________________________________________________________________
lstm_3 (LSTM)                (None, 10)                920       
_________________________________________________________________
dense (Dense)                (None, 10)                110       
=================================================================
Total params: 8,134
Trainable params: 8,134
Non-trainable params: 0
_________________________________________________________________
  1. Since you don't use return_sequences=True , the default is return_sequences=False , which means only the last output from the final LSTM layer is used by the Dense layer.由于您不使用return_sequences=True ,因此默认值为return_sequences=False ,这意味着Dense层仅使用来自最终 LSTM 层的最后一个output 。
  2. Yes.是的。 But it is actually 110 because you have a bias: (10 + 1) * 10.但它实际上是 110,因为你有一个偏差:(10 + 1) * 10。
  3. There would not.没有。 The difference between return_sequence=True and return_sequence=False is that when it is set to false, only the final output will be sent to the next layer. return_sequence=Truereturn_sequence=False的区别在于,当设置为 false 时,只会将最终的 output 发送到下一层。 So if I have a time series data with 30 events (1, 30, 30), only the output from the 30th event will be passed along to the next layer.因此,如果我有一个包含 30 个事件(1、30、30)的时间序列数据,则只有第 30 个事件的 output 将传递到下一层。 The computations are the same, so there will be no difference in weights.计算是相同的,因此权重不会有差异。 Do know that there might be some shape mis-matches if you try to set some of these to be False out of the box.请知道,如果您尝试将其中一些设置为开箱即用的False ,则可能存在一些形状不匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM