神经网络的最后一层应该有多少个神经元？

Question

I use the following code to classify movie reviews into three classes (negative as -1, neutral as 0, and 1 as positive).我使用以下代码将电影评论分为三类（负面为 -1，中性为 0，1 为正面）。 But is it true that there is only one output neuron in the last layer for the three-class classification problem?但是对于三类分类问题，最后一层真的只有一个 output 神经元吗？

import tensorflow as tf
import numpy as np
import pandas as pd
import numpy as K

csvfilename_train = 'train(cleaned).csv'
csvfilename_test = 'test(cleaned).csv'

# Read .csv files as pandas dataframes
df_train = pd.read_csv(csvfilename_train)
df_test = pd.read_csv(csvfilename_test)

train_sentences  = df_train['Comment'].values
test_sentences  = df_test['Comment'].values

# Extract labels from dataframes
train_labels = df_train['Sentiment'].values
test_labels = df_test['Sentiment'].values

vocab_size = 10000
embedding_dim = 16
max_length = 30
trunc_type = 'post'
oov_tok = '<OOV>'

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer(num_words = vocab_size, oov_token = oov_tok)
tokenizer.fit_on_texts(train_sentences)
word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(train_sentences)
padded = pad_sequences(sequences, maxlen = max_length, truncating = trunc_type)

test_sequences = tokenizer.texts_to_sequences(test_sentences)
test_padded = pad_sequences(test_sequences, maxlen = max_length)

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length = max_length),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(6, activation = 'relu'),
    tf.keras.layers.Dense(1, activation = 'sigmoid'),
])
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

num_epochs = 10
model.fit(padded, train_labels, epochs = num_epochs, validation_data = (test_padded, test_labels))

When I changes tf.keras.layers.Dense(1, activation = 'sigmoid') to tf.keras.layers.Dense(2, activation = 'sigmoid') it gives me the following error :当我将tf.keras.layers.Dense(1, activation = 'sigmoid')更改为tf.keras.layers.Dense(2, activation = 'sigmoid')时，它给了我以下错误：

---> 10 model.fit(padded, train_labels, epochs = num_epochs, validation_data = (test_padded,test_labels))
     ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))

Answer 1

You should have 3 neurons if you are classifying between 3 categories.如果您在 3 个类别之间进行分类，您应该有 3 个神经元。

Also, you should use the 'softmax' activation for your final layer, assuming that all observations are in one class only.此外，您应该为最后一层使用'softmax'激活，假设所有观察都在一个 class 中。

Next, you should use 'sparse_categorical_crossentropy' since your input is not one-hot encoded.接下来，您应该使用'sparse_categorical_crossentropy'因为您的输入不是一次性编码的。 Targets like [0,0,1], [0,1,0], [1,0,0] are optional, you can also have [1, 2, 0, 1, 2, 1, 0] . [0,0,1], [0,1,0], [1,0,0]等目标是可选的，您也可以有[1, 2, 0, 1, 2, 1, 0] 。

Finally, your targets should be [0, 1, 2] and not [-1, 0, 1] so I suggest you add 1 to your labels.最后，您的目标应该是[0, 1, 2]而不是[-1, 0, 1]所以我建议您在标签中添加 1。

test_labels = df_test['Sentiment'].values + 1

This is what happens if labels are [-1, 0, 1] instead of [0, 1, 2] :如果标签是[-1, 0, 1]而不是[0, 1, 2]会发生这种情况：

import tensorflow as tf

sparse_entropy = tf.losses.SparseCategoricalCrossentropy()

a = tf.convert_to_tensor([[-1., 0., 1.]]) #+ 1
b = tf.convert_to_tensor([[.4, .2, .4], [.1, .7, .2], [.8, .1, .1]])

sparse_entropy(a, b)

nan

If you uncomment the +1 , which transforms the labels into [0, 1, 2] , it works:如果您取消注释+1 ，它将标签转换为[0, 1, 2] ，它可以工作：

<tf.Tensor: shape=(), dtype=float32, numpy=1.1918503>

Answer 2

Short answer:简短的回答：

One hot encode your train labels and use categorical crossentropy as loss function.一个热编码您的火车标签并使用分类交叉熵作为损失 function。

Cause:原因：

Your logits have shape (n,2) but labels have (n,1).您的 logits 具有形状 (n,2)，但标签具有 (n,1)。
Your logits and labels should be of shape (n,3) if youre using crossentropy(unless it is sparse).如果您使用交叉熵（除非它是稀疏的），您的 logits 和标签应该是形状 (n,3)。

Solution:解决方案：

One hot encode the train labels and you'll get train labels shape (n,3)对火车标签进行一次热编码，您将获得火车标签形状 (n,3)
Use categorical crossentropy with final dense neuron having 3 outputs, then you'll get logits shape(n,3)使用具有 3 个输出的最终密集神经元的分类交叉熵，然后您将获得 logits shape(n,3)

Your model will start learning after this.您的 model 将在此之后开始学习。

Answer 3

You got 3 classes -> num_classes=3 Your last layer should look like this:你有 3 个类 -> num_classes=3 你的最后一层应该是这样的：

tf.keras.layers.Dense(num_classes, activation = 'sigmoid'),

You will receive a np.array with 3 probabilities as output.您将收到一个具有 3 个概率的 np.array，即 output。 Moreover, you should change your class to categorical_crossentropy because you are not solving a binary problem.此外，您应该将 class 更改为 categorical_crossentropy，因为您没有解决二进制问题。

神经网络的最后一层应该有多少个神经元？

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-08-12 17:10:43

解决方案2
2 2020-08-12 17:14:24

解决方案3
0 2020-08-12 17:02:38

神经网络的最后一层应该有多少个神经元？

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-08-12 17:10:43

解决方案2 2 2020-08-12 17:14:24

解决方案3 0 2020-08-12 17:02:38

解决方案1
2 已采纳 2020-08-12 17:10:43

解决方案2
2 2020-08-12 17:14:24

解决方案3
0 2020-08-12 17:02:38