[英]How to implement Tensorflow batch normalization in LSTM
My current LSTM network looks like this. 我目前的LSTM网络看起来像这样。
rnn_cell = tf.contrib.rnn.BasicRNNCell(num_units=CELL_SIZE)
init_s = rnn_cell.zero_state(batch_size=1, dtype=tf.float32) # very first hidden state
outputs, final_s = tf.nn.dynamic_rnn(
rnn_cell, # cell you have chosen
tf_x, # input
initial_state=init_s, # the initial hidden state
time_major=False, # False: (batch, time step, input); True: (time step, batch, input)
)
# reshape 3D output to 2D for fully connected layer
outs2D = tf.reshape(outputs, [-1, CELL_SIZE])
net_outs2D = tf.layers.dense(outs2D, INPUT_SIZE)
# reshape back to 3D
outs = tf.reshape(net_outs2D, [-1, TIME_STEP, INPUT_SIZE])
Usually, I apply tf.layers.batch_normalization
as batch normalization. 通常,我将
tf.layers.batch_normalization
应用为批量标准化。 But I am not sure if this works in a LSTM network. 但我不确定这是否适用于LSTM网络。
b1 = tf.layers.batch_normalization(outputs, momentum=0.4, training=True)
d1 = tf.layers.dropout(b1, rate=0.4, training=True)
# reshape 3D output to 2D for fully connected layer
outs2D = tf.reshape(d1, [-1, CELL_SIZE])
net_outs2D = tf.layers.dense(outs2D, INPUT_SIZE)
# reshape back to 3D
outs = tf.reshape(net_outs2D, [-1, TIME_STEP, INPUT_SIZE])
Based on this paper : "Layer Normalization" - Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton 基于这篇论文 : “层规范化” - 吉米雷巴,杰米赖安基洛斯,杰弗里E.辛顿
Tensorflow now comes with the tf.contrib.rnn.LayerNormBasicLSTMCell
a LSTM unit with layer normalization and recurrent dropout. Tensorflow现在附带
tf.contrib.rnn.LayerNormBasicLSTMCell
LSTM单元,具有图层规范化和重复丢失。
If you want to use batch norm for RNN (LSTM or GRU), you can check out this implementation , or read the full description from blog post . 如果您想使用RNN(LSTM或GRU)的批量规范,您可以查看此实现 ,或阅读博客文章中的完整说明。
However, the layer-normalization has more advantage than batch norm in sequence data. 然而,层序规范化在序列数据中比批量规范更有优势。 Specifically, "the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent networks" (from the paper Ba, et al. Layer normalization ).
具体而言,“批量标准化的效果取决于小批量大小,并且如何将其应用于循环网络并不明显”(来自论文Ba等人,层标准化 )。
For layer normalization, it normalizes the summed inputs within each layer. 对于图层标准化,它会对每个图层中的求和输入进行标准化。 You can check out the implementation of layer-normalization for GRU cell:
您可以查看GRU单元的图层规范化的实现 :
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.