训练深度学习 model 时出错

Question

So I designed a CNN and compiled with following parameters,所以我设计了一个CNN并用以下参数编译，

training_file_loc = "8-SignLanguageMNIST/sign_mnist_train.csv"
testing_file_loc = "8-SignLanguageMNIST/sign_mnist_test.csv"

def getData(filename):
    images = []
    labels = []
    with open(filename) as csv_file:
        file = csv.reader(csv_file, delimiter = ",")
        next(file, None)
        
        for row in file:
            label = row[0]
            data = row[1:]
            img = np.array(data).reshape(28,28)
            
            images.append(img)
            labels.append(label)
        
        images = np.array(images).astype("float64")
        labels = np.array(labels).astype("float64")
        
    return images, labels

training_images, training_labels = getData(training_file_loc)
testing_images, testing_labels = getData(testing_file_loc)

print(training_images.shape, training_labels.shape)
print(testing_images.shape, testing_labels.shape)

training_images = np.expand_dims(training_images, axis = 3)
testing_images = np.expand_dims(testing_images, axis = 3)

training_datagen = ImageDataGenerator(
    rescale = 1/255,
    rotation_range = 45,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = "nearest"
)

training_generator = training_datagen.flow(
    training_images,
    training_labels,
    batch_size = 64,
)


validation_datagen = ImageDataGenerator(
    rescale = 1/255,
    rotation_range = 45,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = "nearest"
)

validation_generator = training_datagen.flow(
    testing_images,
    testing_labels,
    batch_size = 64,
)

model = tf.keras.Sequential([
    keras.layers.Conv2D(16, (3, 3), input_shape = (28, 28, 1), activation = "relu"),
    keras.layers.MaxPooling2D(2, 2),
    keras.layers.Conv2D(32, (3, 3), activation = "relu"),
    keras.layers.MaxPooling2D(2, 2),
    keras.layers.Flatten(),
    keras.layers.Dense(256, activation = "relu"),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(512, activation = "relu"),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(26, activation = "softmax")
])

model.compile(
    loss = "categorical_crossentropy",
    optimizer = RMSprop(lr = 0.001),
    metrics = ["accuracy"]
)

But, as I ran the model.fit(), I get the following error,但是，当我运行 model.fit() 时，出现以下错误，

ValueError: Shapes (None, 1) and (None, 24) are incompatible

After changing the loss function to sparse_categorical_crossentropy , program worked fine.将损失 function 更改为sparse_categorical_crossentropy ，程序运行良好。

I don't understand why this happened.我不明白为什么会这样。

Can anyone explain this and also the difference between those loss functions?谁能解释这一点以及这些损失函数之间的区别？

Answer 1

The issue is, categorical_crossentropy expects one-hot-encoded labels, which means, for each sample it expects a tensor of length num_classes where the label th element is set to 1 and everything else is 0.问题是， categorical_crossentropy需要一个热编码标签，这意味着，对于每个样本，它需要一个长度为num_classes的张量，其中label th 元素设置为 1，其他所有内容为 0。

On the other hand, sparse_categorical_crossentropy uses integer labels directly (because the use-case here is a big number of classes, so the one-hot-encoded label would waste memory with a lot of zeros).另一方面， sparse_categorical_crossentropy直接使用 integer 标签（因为这里的用例是大量的类，所以单热编码的 label 会浪费 ZCD69B4957619BF060个零。） I believe, but I can't confirm this, that categorical_crossentropy is faster to run than its sparse counterpart.我相信，但我无法证实这一点， categorical_crossentropy比其稀疏对应物运行得更快。

For your case, with 26 classes I'd recommend using the non-sparse version and transform your labels to be one-hot encoded like so:对于您的情况，我建议使用 26 个类，使用非稀疏版本并将您的标签转换为一次性编码，如下所示：

def getData(filename):
    images = []
    labels = []
    with open(filename) as csv_file:
        file = csv.reader(csv_file, delimiter = ",")
        next(file, None)
        
        for row in file:
            label = row[0]
            data = row[1:]
            img = np.array(data).reshape(28,28)
            
            images.append(img)
            labels.append(label)
        
        images = np.array(images).astype("float64")
        labels = np.array(labels).astype("float64")
        
    return images, tf.keras.utils.to_categorical(labels, num_classes=26) # you can omit num_classes to have it computed from the data

Side note: unless you have a reason to use float64 for images, I'd switch to float32 (it halves the memory required for the dataset and the model likely converts them to float32 as the first operation anyway)旁注：除非您有理由将float64用于图像，否则我会切换到float32 （它将数据集所需的 memory 减半，并且 model 可能会将它们转换为float32作为第一个操作）

Answer 2

Simple, For the classification problem where your output classes are in integers sparse_categorical_crosentropy, is used and for those where the labels are converted in one hot encoded labels, we use categorical_crosentropy.很简单，对于您的 output 类是整数 sparse_categorical_crosentropy 的分类问题，用于标签转换为一个热编码标签的分类问题，我们使用 categorical_crosentropy。

训练深度学习 model 时出错

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-07-21 09:43:43

解决方案2
0 2020-07-26 11:01:28

训练深度学习 model 时出错

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-07-21 09:43:43

解决方案2 0 2020-07-26 11:01:28

解决方案1
2 已采纳 2020-07-21 09:43:43

解决方案2
0 2020-07-26 11:01:28