使用Keras和TensorFlow实现LSTM网络

Question

With limited knowledge, I've built an LSTM network. 凭借有限的知识，我已经建立了LSTM网络。 I would like to validate my assumptions and better understand the Keras API. 我想验证我的假设并更好地了解Keras API。

Network Code: 网络代码：

#...
model.add(LSTM(8, batch_input_shape=(None, 100, 4), return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(4, return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(1, return_sequences=False, activation='softmax'))
#...

I have tried to build a network with 4 features input, 2 hidden layers: the first one with 8 neurons, second one with 4 neurons and 1 neuron on the output layer. 我试图建立一个具有4个要素输入和2个隐藏层的网络：第一个在输出层上具有8个神经元，第二个具有4个神经元和1个神经元。

The activation I wanted was LeakyReLU. 我想要的激活是LeakyReLU。

Q: 问：

Is the implementation correct? 实施正确吗？
ie: does the code reflects what I planned? 即：该代码是否反映了我的计划？
When using LeakyReLU should I add linear activation on the previous layer? 使用LeakyReLU时，应该在上一层添加线性激活吗？
ie: Do I need to add activation='linear' to the LSTM layers? 即：我是否需要在LSTM层上添加activation='linear' ？

Answer 1

As for the first question: "correct" in what sense? 至于第一个问题：“正确”是什么意思？ ie It depends on the problem you are modeling and therefore more details need to be provided. 即，这取决于您要建模的问题，因此需要提供更多详细信息。

softmax is not used as the activation function when the last layer has only one output unit. 当最后一层只有一个输出单元时， softmax不用作激活功能。 That's because softmax normalizes the output to make the sum of its elements be one, ie to resemble a probability distribution. 这是因为softmax对输出进行归一化以使其元素之和为1，即类似于概率分布。 Therefore, if you use it on a layer with only one output unit it would always have an output of 1. Instead, either linear (in case of regression, ie predicting real values) or sigmoid (in case of binary classification) is used. 因此，如果在只有一个输出单位的层上使用它，则其输出总是为1。取而代之的是，使用linear （在回归的情况下，即预测实数值）或sigmoid （在二进制分类的情况下）。 Additionally, commonly a Dense layer is used as the last layer which acts as the final regressor or classifier. 另外，通常将Dense层用作充当最终回归器或分类器的最后一层。 For example: 例如：

model.add(LSTM(8, batch_input_shape=(None, 100, 4), return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(4, return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(1, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))

As for the layers and number of units (according to the figure): it is a bit ambiguous, but I think there are three LSTM layers, the first one has 4 units, the second one has 8 units and the last one has 4 units. 至于层数和单位数（根据该图）：有点模棱两可，但是我认为有3个LSTM层，第一个具有4个单元，第二个具有8个单元，最后一个具有4个单元。 As for the final layer it seems to be a Dense layer. 至于最后一层，似乎是密集层。 So the model would look like this (assuming LeakyReLU is applied on the output of LSTM layers): 因此模型看起来像这样（假设LeakyReLU应用于LSTM层的输出）：

model.add(LSTM(4, batch_input_shape=(None, 100, 4), return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(8, return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(4, return_sequences=False))
model.add(Dense(1, activation='sigmoid')) # or activation='linear' if it is a regression problem

As for using the LeakyReLU layer: I guess you are right that linear activation should be used as the activation of its previous layer (as also suggested here , though a Dense layer has been used there). 作为使用LeakyReLU层：我猜你是正确的， linear活化应作为其上一层的激活（如还建议在这里，虽然Dense层已使用那里）。 That's because by default the activation of LSTM layer is hyperbolic tangent (ie tanh ) and therefore it squashes the outputs to the range [-1,1] which I think may not be efficient when you apply LeakyReLU on it; 这是因为默认情况下，LSTM层的激活是双曲正切（即tanh ），因此会将输出LeakyReLU到[-1,1]范围，我认为在其上应用LeakyReLU可能没有效果； however, I am not sure about this since I am not completely familiar with leaky relu's practical and recommended usage. 但是，我对此不太确定，因为我对漏水的relu的实际使用和推荐用法并不完全熟悉。

使用Keras和TensorFlow实现LSTM网络

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-10-04 15:10:53

使用Keras和TensorFlow实现LSTM网络

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-10-04 15:10:53

解决方案1
2 已采纳 2018-10-04 15:10:53