具有 python 和 keras 的 2 个以上类的文本分类

Question

I'm trying categorization with a neuronal network.我正在尝试使用神经元网络进行分类。

my data looks like this:我的数据如下所示：

Sentence句子	category类别
sentence 1第 1 句	0 0
sentence 2第 2 句	1 1
sentence 3第 3 句	2 2
sentence 4第 4 句	3 3

therefore I have 4 different categories.因此我有 4 个不同的类别。 I seperate the sentences and the labels in 2 different lists and put the first list (with the sencenes) in a tokenizer.我将句子和标签分隔在 2 个不同的列表中，并将第一个列表（带有 sencenes）放入标记器中。

Edit: Because someone asked about data samples.编辑：因为有人询问数据样本。 Here is a link to the Dropbox (data_samples.txt ) and some more explanations.这是 Dropbox (data_samples.txt ) 的链接和更多解释。 0 = not needed, 1 = title, 2= name from author, 3 = pages. 0 = 不需要，1 = 标题，2= 作者姓名，3 = 页数。 The sentence is with |句子是 | from the label separated.从 label 中分离出来。 There are lot of special character (like -- >,") because of the lemmatization but the tokenizer skips these characters. The content is from different websites and I only use it in the context of study (no for money or in bad intentions)由于词形还原，有很多特殊字符（如 -- >,"），但分词器会跳过这些字符。内容来自不同的网站，我只在学习的情况下使用它（不要为了钱或出于恶意）

y_train = array(labels) # it has to be numpy array otherwise it will cause errors, labels is a list of integers
x_train = tokenizer.texts_to_sequences(newLines) # newLines is a list with all sentences
x_train = pad_sequences(x_train, maxlen=sequenceLength)
vocabSize = len(tokenizer.word_index) + 1

An now i will train my model现在我将训练我的 model

mymodel = Sequential()
mymodel.add(Embedding(input_dim=vocabSize, output_dim=100, input_length=sequenceLength))
mymodel.add(Conv1D(32, 3, padding='same', activation='relu'))
mymodel.add(MaxPooling1D())
mymodel.add(Flatten())
mymodel.add(Dense(250, activation='relu'))
mymodel.add(Dense(4, activation='softmax'))
mymodel.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
history = mymodel.fit(x_train, y_train, epochs=30, batch_size=8)

But if i start the programm i get the following error:但是，如果我启动程序，我会收到以下错误：

ValueError: Shapes (None, 1) and (None, 4) are incompatible ValueError：形状 (None, 1) 和 (None, 4) 不兼容

I know it is because of the last Dense Layer.我知道这是因为最后一个密集层。 If I write 1 instead of 4, I don't get an error.如果我写的是 1 而不是 4，我不会收到错误消息。 But I thought the last dense layer need a 4 as a parameter, because I have 4 categories.但我认为最后一个密集层需要一个 4 作为参数，因为我有 4 个类别。 I think the last dense layer with a 1 instead of a 4 is not correct, because loss is 0.0000e+00.我认为最后一个用 1 而不是 4 的密集层是不正确的，因为损失是 0.0000e+00。 And that look not correct xD而且看起来不正确xD

What did I do wrong?我做错了什么？

Answer 1

replace loss="categorical_crossentropy" with "sparse_ categorical_crossentropy".将 loss="categorical_crossentropy" 替换为 "sparse_categorical_crossentropy"。

The model produces a value per class; model 根据 class 产生一个值； the sparse_ loss knows how to compare a single value in range [0-3] to these 4 score values. sparse_loss 知道如何将 [0-3] 范围内的单个值与这 4 个分值进行比较。

具有 python 和 keras 的 2 个以上类的文本分类

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-05-12 20:00:51

具有 python 和 keras 的 2 个以上类的文本分类

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-05-12 20:00:51

解决方案1
0 已采纳 2021-05-12 20:00:51