手写文字识别（CNN + LSTM + CTC）需要RNN解释

Question

I am trying to understand the following code, which is in python & tensorflow. 我正在尝试理解以下代码，它们是python＆tensorflow中的代码。 Im trying to implement a handwriting text recognition. 我正在尝试实现手写文本识别。 I am referring to the following code here 我在这里指的是以下代码

I dont understand why the RNN output is put through a "atrous_conv2d" 我不明白为什么RNN输出通过“ atrous_conv2d”

This is the architecture of my model, takes a CNN input and pass into this RNN process and then pass it to a CTC. 这是我的模型的体系结构，接受CNN输入并传递到该RNN流程，然后将其传递给CTC。

 def build_RNN(self, rnnIn4d):

    rnnIn3d = tf.squeeze(rnnIn4d, axis=[2])  # squeeze remove 1 dimensions, here it removes the 2nd index

    n_hidden = 256
    n_layers = 2
    cells = []

    for _ in range(n_layers):
        cells.append(tf.nn.rnn_cell.LSTMCell(num_units=n_hidden))

    stacked = tf.nn.rnn_cell.MultiRNNCell(cells)  # combine the 2 LSTMCell created

    # BxTxF -> BxTx2H
    ((fw, bw), _) = tf.nn.bidirectional_dynamic_rnn(cell_fw=stacked, cell_bw=stacked, inputs=rnnIn3d,
                                                    dtype=rnnIn3d.dtype)

    # BxTxH + BxTxH -> BxTx2H -> BxTx1X2H
    concat = tf.expand_dims(tf.concat([fw, bw], 2), 2)

    # project output to chars (including blank): BxTx1x2H -> BxTx1xC -> BxTxC
    kernel = tf.Variable(tf.truncated_normal([1, 1, n_hidden * 2, len(self.char_list) + 1], stddev=0.1))
    rnn = tf.nn.atrous_conv2d(value=concat, filters=kernel, rate=1, padding='SAME')

    return tf.squeeze(rnn, axis=[2])

Answer 1

The input to CTC loss layer will be of the form B x T x C CTC损失层的输入形式为B x T x C

B - Batch Size T - Max length of the output (twice max word length due to blank char) C - number of character + 1 (blank char) B-批处理大小T-输出的最大长度（由于空白字符而导致的最大字长是两倍）C-字符数+ 1（空白字符）

Input to atrous is of shape (B x T x 1 X 2T) == (batch, height ,width ,channel) filter we are using is (1,1,2T,C) == (height ,width ,input channel ,output channel) 输入到圆环的形状为（B x T x 1 X 2T）==（批量，高度，宽度，通道）我们使用的过滤器是（1,1,2T，C）==（高度，宽度，输入通道，输出通道）

After atrous CNN we will get (B ,T ,1 ,C) which is the desired output for CTC 在无声的CNN之后，我们将获得（B，T，1，C）这是CTC所需的输出

note: we will take a transpose before we input our image to CNN since tf is row major. 注意：由于tf是行专业的，因此在将图像输入到CNN之前我们将进行转置。

atrous with rate 1 is same as normal conv layer. 速率为1的异常与正常转换层相同。

手写文字识别（CNN + LSTM + CTC）需要RNN解释

问题描述

1 个解决方案

解决方案1
1 2019-03-08 05:37:49

手写文字识别（CNN + LSTM + CTC）需要RNN解释

问题描述

1 个解决方案

解决方案1 1 2019-03-08 05:37:49

解决方案1
1 2019-03-08 05:37:49