简体   繁体   English

如何在LSTM中实现Tensorflow批量规范化

[英]How to implement Tensorflow batch normalization in LSTM

My current LSTM network looks like this. 我目前的LSTM网络看起来像这样。

rnn_cell = tf.contrib.rnn.BasicRNNCell(num_units=CELL_SIZE)
init_s = rnn_cell.zero_state(batch_size=1, dtype=tf.float32)  # very first hidden state
outputs, final_s = tf.nn.dynamic_rnn(
    rnn_cell,              # cell you have chosen
    tf_x,                  # input
    initial_state=init_s,  # the initial hidden state
    time_major=False,      # False: (batch, time step, input); True: (time step, batch, input)
)

# reshape 3D output to 2D for fully connected layer
outs2D = tf.reshape(outputs, [-1, CELL_SIZE])
net_outs2D = tf.layers.dense(outs2D, INPUT_SIZE)

# reshape back to 3D
outs = tf.reshape(net_outs2D, [-1, TIME_STEP, INPUT_SIZE])

Usually, I apply tf.layers.batch_normalization as batch normalization. 通常,我将tf.layers.batch_normalization应用为批量标准化。 But I am not sure if this works in a LSTM network. 但我不确定这是否适用于LSTM网络。

b1 = tf.layers.batch_normalization(outputs, momentum=0.4, training=True)
d1 = tf.layers.dropout(b1, rate=0.4, training=True)

# reshape 3D output to 2D for fully connected layer
outs2D = tf.reshape(d1, [-1, CELL_SIZE])                       
net_outs2D = tf.layers.dense(outs2D, INPUT_SIZE)

# reshape back to 3D
outs = tf.reshape(net_outs2D, [-1, TIME_STEP, INPUT_SIZE])

Based on this paper : "Layer Normalization" - Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton 基于这篇论文“层规范化” - 吉米雷巴,杰米赖安基洛斯,杰弗里E.辛顿

Tensorflow now comes with the tf.contrib.rnn.LayerNormBasicLSTMCell a LSTM unit with layer normalization and recurrent dropout. Tensorflow现在附带tf.contrib.rnn.LayerNormBasicLSTMCell LSTM单元,具有图层规范化和重复丢失。

Find the documentation here . 这里找到文档。

If you want to use batch norm for RNN (LSTM or GRU), you can check out this implementation , or read the full description from blog post . 如果您想使用RNN(LSTM或GRU)的批量规范,您可以查看此实现 ,或阅读博客文章中的完整说明。

However, the layer-normalization has more advantage than batch norm in sequence data. 然而,层序规范化在序列数据中比批量规范更有优势。 Specifically, "the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent networks" (from the paper Ba, et al. Layer normalization ). 具体而言,“批量标准化的效果取决于小批量大小,并且如何将其应用于循环网络并不明显”(来自论文Ba等人,层标准化 )。

For layer normalization, it normalizes the summed inputs within each layer. 对于图层标准化,它会对每个图层中的求和输入进行标准化。 You can check out the implementation of layer-normalization for GRU cell: 您可以查看GRU单元的图层规范化的实现

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM