如何監控 CTC 損失 function 和數據集的准確性？（包括可運行代碼）

Question

我一直在嘗試加快我的 CRNN 網絡的光學字符識別訓練，但是在使用 TFRecords 和tf.data.Dataset管道時我無法獲得准確度指標。 我以前使用過 Keras 序列並讓它工作。 這是一個完整的可運行玩具示例，顯示了我的問題（使用 Tensorflow 2.4.1 測試）：

import random
import numpy as np
import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.python.keras import Input, Model
from tensorflow.python.keras.layers import Dense, Layer, Bidirectional, GRU, Reshape, Activation
from tensorflow.python.keras.optimizer_v2.adam import Adam

AUTOTUNE = tf.data.experimental.AUTOTUNE
CHAR_VECTOR = "ABC"
IMG_W = 10
IMG_H = 10
N_CHANNELS = 3


class CTCLayer(Layer):
    def __init__(self, name=None):
        super().__init__(name=name)
        self.loss_fn = K.ctc_batch_cost

    def call(self, y_true, y_pred, label_length):
        # Compute the training-time loss value and add it
        # to the layer using `self.add_loss()`.
        batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
        input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
        input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")

        loss = self.loss_fn(y_true, y_pred, input_length, label_length)
        self.add_loss(loss)

        # At test time, just return the computed predictions
        return y_pred


def get_model():
    n_classes = len(CHAR_VECTOR) + 1

    input = Input(name='image', shape=(IMG_W, IMG_H, N_CHANNELS), dtype='float32')
    label = Input(name='label', shape=[None], dtype='float32')
    label_length = Input(name='label_length', shape=[None], dtype='int64')

    x = Reshape(target_shape=(IMG_W, np.prod(input.shape[2:])), name='reshape')(input)
    x = Dense(24, activation='relu', name='dense1')(x)
    x = Bidirectional(GRU(24, return_sequences=True, name="GRU"), merge_mode="sum")(x)
    x = Dense(n_classes, name='dense2')(x)
    y_pred = Activation('softmax', name='softmax')(x)

    output = CTCLayer(name="ctc")(label, y_pred, label_length)

    m = Model(inputs=[input, label, label_length], outputs=output)
    return m


def image_feature(value):
    """Returns a bytes_list from a string / byte."""
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[tf.io.encode_jpeg(value).numpy()]))


def float_feature_list(value):
    """Returns a list of float_list from a float / double."""
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))


def int64_feature(value):
    """Returns an int64_list from a bool / enum / int / uint."""
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def create_example(image, label, label_length):
    feature = {
        "image": image_feature(image),
        "label": float_feature_list(label),
        "label_length": int64_feature(label_length),
    }
    return tf.train.Example(features=tf.train.Features(feature=feature))


def parse_tfrecord_fn(example):
    feature_description = {
        "image": tf.io.FixedLenFeature([], tf.string),
        "label": tf.io.VarLenFeature(tf.float32),
        "label_length": tf.io.FixedLenFeature([1], tf.int64),
    }
    example = tf.io.parse_single_example(example, feature_description)
    example["image"] = tf.image.convert_image_dtype(tf.io.decode_jpeg(example["image"], channels=3), dtype="float32")
    example["label"] = tf.sparse.to_dense(example["label"])

    return example


def generate_tfrecords(n):
    with tf.io.TFRecordWriter(filename) as writer:
        for i in range(n):
            random_img = np.random.random((IMG_W, IMG_H, N_CHANNELS))
            label_length = random.randint(1, max_text_len)
            label = np.random.randint(0, len(CHAR_VECTOR), max_text_len)
            example = create_example(random_img, label, label_length)
            writer.write(example.SerializeToString())


class DataGenerator(tf.keras.utils.Sequence):
    def __len__(self):
        return steps_per_epoch

    def __getitem__(self, index):
        outputs = np.zeros([batch_size])
        dataset = get_dataset()
        inputs = next(iter(dataset.take(1)))
        return inputs, outputs


def get_dataset():
    generate_tfrecords(batch_size * epochs * steps_per_epoch)
    dataset = (
        tf.data.TFRecordDataset(filename, num_parallel_reads=AUTOTUNE)
        .map(parse_tfrecord_fn, num_parallel_calls=AUTOTUNE)
        .batch(batch_size)
        .prefetch(AUTOTUNE)
    )
    return dataset


if __name__ == "__main__":
    batch_size = 9
    epochs = 7
    steps_per_epoch = 8
    max_text_len = 5
    filename = "test.tfrec"
    use_generator = False
    data = DataGenerator() if use_generator else get_dataset()

    model = get_model()
    '''This fails when use_generator == False, removing the 
     metric solves it'''
    model.compile(optimizer=Adam(), metrics=["accuracy"])
    model.fit(data, epochs=epochs, steps_per_epoch=steps_per_epoch)

設置use_generator = False或刪除metrics=["accuracy"]它將運行而不會出錯。

如您所見， DataGenerator使用來自 TFRecords 的相同數據，但它也返回一些零，無論出於何種原因，這似乎是神奇的調味汁：

class DataGenerator(tf.keras.utils.Sequence):
    def __len__(self):
        return steps_per_epoch

    def __getitem__(self, index):
        outputs = np.zeros([batch_size])
        dataset = get_dataset()
        inputs = next(iter(dataset.take(1)))
        return inputs, outputs

我還注意到這個 Keras 示例遇到了同樣的問題（如果您編輯代碼以監控准確性，它會崩潰）： https://keras.io/examples/vision/captcha_ocr/

有沒有辦法用Dataset模仿__getitem__的行為，或者在不使用Sequence的情況下獲得准確性的其他方式？

Answer 1

當您傳遞數據集進行訓練時，您需要包含輸出。 您的生成器 function （正確）返回一個元組（輸入、輸出）； 當您直接傳遞缺少的數據集時。

如果您修改映射器 function 如下：

def parse_tfrecord_fn(example):
    feature_description = {
        "image": tf.io.FixedLenFeature([], tf.string),
        "label": tf.io.VarLenFeature(tf.float32),
        "label_length": tf.io.FixedLenFeature([1], tf.int64),
    }
    tf_example = tf.io.parse_single_example(example, feature_description)
    tf_example["image"] = tf.image.convert_image_dtype(tf.io.decode_jpeg(tf_example["image"], channels=3), dtype="float32")
    tf_example["label"] = tf.sparse.to_dense(tf_example["label"])

    return tf_example, tf.constant([0])

代碼現在將在 use_generator = False 的情況下正常運行。 請注意，作為度量標准的准確性沒有意義。 該指標將網絡 (y_pred) 的 output 與目標 (tf.constant([0]) 進行比較。為了測量准確度，您需要將 label 作為目標...並且您需要一個 ZC1C41452074C 可以比較您的網絡的 output 形狀為 (batch_size, max_sequence_lenght, n_classes)，帶有標簽。也就是說，您需要一個稀疏的分類准確度指標。

您可以在以下位置找到我的筆記本： https://colab.research.google.com/drive/1z2NCQnYlG_UIpN7bBNpXXbLwy3JE_PX2?usp=sharing

Answer 2

tf.data的[accuracy]可能存在一些問題，但我不確定這是否是您的情況的主要原因，或者問題是否仍然存在。 如果我嘗試如下，它無論如何都會在沒有Sequence的情況下運行（使用tf.data ）。

model.compile(optimizer=Adam(), metrics=['sparse_categorical_accuracy'])

如何監控 CTC 損失 function 和數據集的准確性？（包括可運行代碼）

問題描述

2 個解決方案

解決方案1
2 已采納 2021-05-19 20:37:18

解決方案2
1 2021-05-17 09:45:33

如何監控 CTC 損失 function 和數據集的准確性？ （包括可運行代碼）

問題描述

2 個解決方案

解決方案1 2 已采納 2021-05-19 20:37:18

解決方案2 1 2021-05-17 09:45:33

如何監控 CTC 損失 function 和數據集的准確性？（包括可運行代碼）

解決方案1
2 已采納 2021-05-19 20:37:18

解決方案2
1 2021-05-17 09:45:33