为什么在 CNN 的 output 层中使用 softmax 而不是 sigmoid 时得到截然不同的结果？

Question

I have a simple model which classifies images of triangles and circles.我有一个简单的 model 对三角形和圆形的图像进行分类。

Code:代码：

    model = Sequential([
    Conv2D(16, 3, padding='same', activation='relu', input_shape=(150, 150 ,3)),
    MaxPooling2D(),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(512, activation='relu'),
    Dense(1,activation='sigmoid'),])
    model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

    model.fit(Xtr,ytr,epochs=3,batch_size=10)

The performance on the test set is:在测试集上的表现是：

But when I change the activation function in the output layer into softmax , ie the last layer turns into Dense(1,activation='softmax') , the model's performance becomes但是当我将 output 层中的激活 function 更改为softmax时，即最后一层变为Dense(1,activation='softmax') ，模型的性能变为

I made different dataset splits, the results remained roughly the same (the model with softmax activation performed equally badly).我进行了不同的数据集拆分，结果大致相同（使用 softmax 激活的 model 表现同样糟糕）。 What is the issue?问题是什么？

Answer 1

Using softmax, with your current configuration, is actually forcing it to choose always only one class.在您当前的配置下使用 softmax，实际上是在强制它始终只选择一个 class。 It might be the reason you get always recall equal to zero for one class and one for the other class in your experience using softmax.这可能是在您使用 softmax 的经验中，一个 class 和另一个 class 的召回率总是为零的原因。

First, you need to change the loss.首先，您需要更改损失。 The binary_crossentropy is not supposed to be used for softmax. binary_crossentropy不应该用于 softmax。 If you change the loss to categorical cross-entropy and make the DENSE of size 2 for the last layer( since you want to choose between two classes using softmax), you should get almost the same performance;如果您将损失更改为categorical cross-entropy并使最后一层的 DENSE 大小为 2（因为您想使用 softmax 在两个类之间进行选择），您应该获得几乎相同的性能； ie: change this part of the code;即：更改这部分代码；

Dense(1,activation='sigmoid'),])
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

with this one:有了这个：

Dense(2,activation='softmax'),])
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Answer 2

model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(150, 150 ,3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
#Dense(1,activation='sigmoid'),]) #this means you have 1 node in last layer
                                  #that's why you have result of 1 class
Dense(2,activation='softmax'),])  #this is meaninful for 2 class. it will work on your code
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.fit(Xtr,ytr,epochs=3,batch_size=10)

为什么在 CNN 的 output 层中使用 softmax 而不是 sigmoid 时得到截然不同的结果？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-05-21 18:11:53

解决方案2
0 2020-05-21 18:19:33

为什么在 CNN 的 output 层中使用 softmax 而不是 sigmoid 时得到截然不同的结果？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-05-21 18:11:53

解决方案2 0 2020-05-21 18:19:33

解决方案1
1 已采纳 2020-05-21 18:11:53

解决方案2
0 2020-05-21 18:19:33