[英]Error while training a deep learning model
So I designed a CNN and compiled with following parameters,所以我设计了一个CNN并用以下参数编译,
training_file_loc = "8-SignLanguageMNIST/sign_mnist_train.csv"
testing_file_loc = "8-SignLanguageMNIST/sign_mnist_test.csv"
def getData(filename):
images = []
labels = []
with open(filename) as csv_file:
file = csv.reader(csv_file, delimiter = ",")
next(file, None)
for row in file:
label = row[0]
data = row[1:]
img = np.array(data).reshape(28,28)
images.append(img)
labels.append(label)
images = np.array(images).astype("float64")
labels = np.array(labels).astype("float64")
return images, labels
training_images, training_labels = getData(training_file_loc)
testing_images, testing_labels = getData(testing_file_loc)
print(training_images.shape, training_labels.shape)
print(testing_images.shape, testing_labels.shape)
training_images = np.expand_dims(training_images, axis = 3)
testing_images = np.expand_dims(testing_images, axis = 3)
training_datagen = ImageDataGenerator(
rescale = 1/255,
rotation_range = 45,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True,
fill_mode = "nearest"
)
training_generator = training_datagen.flow(
training_images,
training_labels,
batch_size = 64,
)
validation_datagen = ImageDataGenerator(
rescale = 1/255,
rotation_range = 45,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True,
fill_mode = "nearest"
)
validation_generator = training_datagen.flow(
testing_images,
testing_labels,
batch_size = 64,
)
model = tf.keras.Sequential([
keras.layers.Conv2D(16, (3, 3), input_shape = (28, 28, 1), activation = "relu"),
keras.layers.MaxPooling2D(2, 2),
keras.layers.Conv2D(32, (3, 3), activation = "relu"),
keras.layers.MaxPooling2D(2, 2),
keras.layers.Flatten(),
keras.layers.Dense(256, activation = "relu"),
keras.layers.Dropout(0.25),
keras.layers.Dense(512, activation = "relu"),
keras.layers.Dropout(0.25),
keras.layers.Dense(26, activation = "softmax")
])
model.compile(
loss = "categorical_crossentropy",
optimizer = RMSprop(lr = 0.001),
metrics = ["accuracy"]
)
But, as I ran the model.fit(), I get the following error,但是,当我运行 model.fit() 时,出现以下错误,
ValueError: Shapes (None, 1) and (None, 24) are incompatible
After changing the loss function to sparse_categorical_crossentropy
, program worked fine.将损失 function 更改为
sparse_categorical_crossentropy
,程序运行良好。
I don't understand why this happened.我不明白为什么会这样。
Can anyone explain this and also the difference between those loss functions?谁能解释这一点以及这些损失函数之间的区别?
The issue is, categorical_crossentropy
expects one-hot-encoded labels, which means, for each sample it expects a tensor of length num_classes
where the label
th element is set to 1 and everything else is 0.问题是,
categorical_crossentropy
需要一个热编码标签,这意味着,对于每个样本,它需要一个长度为num_classes
的张量,其中label
th 元素设置为 1,其他所有内容为 0。
On the other hand, sparse_categorical_crossentropy
uses integer labels directly (because the use-case here is a big number of classes, so the one-hot-encoded label would waste memory with a lot of zeros).另一方面,
sparse_categorical_crossentropy
直接使用 integer 标签(因为这里的用例是大量的类,所以单热编码的 label 会浪费 ZCD69B4957619BF060个零。) I believe, but I can't confirm this, that categorical_crossentropy
is faster to run than its sparse counterpart.我相信,但我无法证实这一点,
categorical_crossentropy
比其稀疏对应物运行得更快。
For your case, with 26 classes I'd recommend using the non-sparse version and transform your labels to be one-hot encoded like so:对于您的情况,我建议使用 26 个类,使用非稀疏版本并将您的标签转换为一次性编码,如下所示:
def getData(filename):
images = []
labels = []
with open(filename) as csv_file:
file = csv.reader(csv_file, delimiter = ",")
next(file, None)
for row in file:
label = row[0]
data = row[1:]
img = np.array(data).reshape(28,28)
images.append(img)
labels.append(label)
images = np.array(images).astype("float64")
labels = np.array(labels).astype("float64")
return images, tf.keras.utils.to_categorical(labels, num_classes=26) # you can omit num_classes to have it computed from the data
Side note: unless you have a reason to use float64
for images, I'd switch to float32
(it halves the memory required for the dataset and the model likely converts them to float32
as the first operation anyway)旁注:除非您有理由将
float64
用于图像,否则我会切换到float32
(它将数据集所需的 memory 减半,并且 model 可能会将它们转换为float32
作为第一个操作)
Simple, For the classification problem where your output classes are in integers sparse_categorical_crosentropy, is used and for those where the labels are converted in one hot encoded labels, we use categorical_crosentropy.很简单,对于您的 output 类是整数 sparse_categorical_crosentropy 的分类问题,用于标签转换为一个热编码标签的分类问题,我们使用 categorical_crosentropy。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.