简体   繁体   English

Keras ctc_decode形状必须为等级1但为等级2

[英]Keras ctc_decode shape must be rank 1 but is rank 2

I am implementing an OCR with Keras, Tensorflow backend. 我正在使用Tensorflow后端Keras实施OCR。

I want to use keras.backend.ctc_decode implementation. 我想使用keras.backend.ctc_decode实现。

I have a model class : 我有一个模型课:

import keras


def ctc_lambda_func(args):
    y_pred, y_true, input_x_width, input_y_width = args
    # the 2 is critical here since the first couple outputs of the RNN
    # tend to be garbage:
    # y_pred = y_pred[:, 2:, :]
    return keras.backend.ctc_batch_cost(y_true, y_pred, input_x_width, input_y_width)


class ModelOcropy(keras.Model):
    def __init__(self, alphabet: str):
        self.img_height = 48
        self.lstm_size = 100
        self.alphabet_size = len(alphabet)

        # check backend input shape (channel first/last)
        if keras.backend.image_data_format() == "channels_first":
            input_shape = (1, None, self.img_height)
        else:
            input_shape = (None, self.img_height, 1)

        # data input
        input_x = keras.layers.Input(input_shape, name='x')

        # training inputs
        input_y = keras.layers.Input((None,), name='y')
        input_x_widths = keras.layers.Input([1], name='x_widths')
        input_y_widths = keras.layers.Input([1], name='y_widths')

        # network
        flattened_input_x = keras.layers.Reshape((-1, self.img_height))(input_x)
        bidirectional_lstm = keras.layers.Bidirectional(
            keras.layers.LSTM(self.lstm_size, return_sequences=True, name='lstm'),
            name='bidirectional_lstm'
        )(flattened_input_x)
        dense = keras.layers.Dense(self.alphabet_size, activation='relu')(bidirectional_lstm)
        y_pred = keras.layers.Softmax(name='y_pred')(dense)

        # ctc loss
        ctc = keras.layers.Lambda(ctc_lambda_func, output_shape=[1], name='ctc')(
            [dense, input_y, input_x_widths, input_y_widths]
        )

        # init keras model
        super().__init__(inputs=[input_x, input_x_widths, input_y, input_y_widths], outputs=[y_pred, ctc])

        # ctc decoder
        top_k_decoded, _ = keras.backend.ctc_decode(y_pred, input_x_widths)
        self.decoder = keras.backend.function([input_x, input_x_widths], [top_k_decoded[0]])
        # decoded_sequences = self.decoder([test_input_data, test_input_lengths])

My use of ctc_decode comes from another post : Keras using Lambda layers error with K.ctc_decode 我对ctc_decode使用来自另一篇文章: Keras使用Lambda层错误和K.ctc_decode

I get an error : 我得到一个错误:

ValueError: Shape must be rank 1 but is rank 2 for 'CTCGreedyDecoder' (op: 'CTCGreedyDecoder') with input shapes: [?,?,7], [?,1].

I guess I have to squeeze my input_x_widths , but Keras does not seem to have such function (it always outputs something like (batch_size, 1) ) 我想我必须压缩我的input_x_widths ,但是input_x_widths似乎没有这样的功能(它总是输出类似(batch_size, 1)东西)

Indeed, the function is expecting a 1D tensor, and you've got a 2D tensor. 实际上,该函数期望一个1D张量,而您拥有一个2D张量。

  • Keras does have the keras.backend.squeeze(x, axis=-1) function. keras.backend.squeeze(x, axis=-1)确实具有keras.backend.squeeze(x, axis=-1)函数。
  • And you can also use keras.backend.reshape(x, (-1,)) 而且你也可以使用keras.backend.reshape(x, (-1,))

If you need to go back to the old shape after the operation, you can both: 如果您需要在手术后恢复到原来的形状,则可以两种:

  • keras.backend.expand_dims(x)
  • keras.backend.reshape(x,(-1,1))

Complete fix : 完整修复:

    # ctc decoder
    flattened_input_x_width = keras.backend.reshape(input_x_widths, (-1,))
    top_k_decoded, _ = keras.backend.ctc_decode(y_pred, flattened_input_x_width)
    self.decoder = keras.backend.function([input_x, flattened_input_x_width], [top_k_decoded[0]])
    # decoded_sequences = self.decoder([input_x, flattened_input_x_width])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM