简体   繁体   English

keras 分类和二元交叉熵

[英]keras categorical and binary crossentropy

After using keras by implementing some examples and looking for tutorials I am kind of confused which cross entropy function I should use in my project.通过实现一些示例并寻找教程来使用 keras 后,我有点困惑我应该在我的项目中使用哪个交叉熵函数。 In my case I want to predict multiple labels such as (positive, negative and neutral) for online comments with a LSTM model.就我而言,我想使用 LSTM 模型预测在线评论的多个标签,例如(正面、负面和中性)。 The labels have been converted to one-hot vectors with the to_categorical method in keras, which is also documented in keras:标签已使用 keras 中的to_categorical方法转换为 one-hot 向量,这也记录在 keras 中:

(...) when using the categorical_crossentropy loss, your targets should be in categorical format (eg if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample). (...) 当使用 categorical_crossentropy 损失时,你的目标应该是分类格式(例如,如果你有 10 个类,每个样本的目标应该是一个 10 维向量,它是全零,期望索引处为 1对应于样本的类别)。

The one-hot array looks as follow: one-hot 数组如下所示:

array([[1., 0., 0.],
      [1., 0., 0.],
      [0., 0., 1.],

Because there a multiple labels I would prefer to use categorical_crossentropy .因为有多个标签,我更喜欢使用categorical_crossentropy I also implemented a model with this criteria but the accuracy of this model was only above 20%.我也用这个标准实现了一个模型,但这个模型的准确率只有 20% 以上。 Using binary_crossentropy with a sigmoid function my accuracy have been reached to 80%.使用带有 sigmoid 函数的binary_crossentropy ,我的准确率已达到 80%。 I am really confused, because some guys argued with the following statements :我真的很困惑,因为有些人对以下陈述争论不休:

the accuracy computed with the Keras method "evaluate" is just plain wrong when using binary_crossentropy with more than 2 labels使用具有 2 个以上标签的 binary_crossentropy 时,使用 Keras 方法“评估”计算的准确度完全错误

whereas other have already implemented high performanced model with binary crossentropy and multiple labels, which is kind of the same workflow.而其他人已经实现了具有二元交叉熵和多个标签的高性能模型,这是一种相同的工作流程。

We want probability of each class.我们想要每个类别的概率。 So we are using sigmoid on final layer, which gives output in range 0 to 1. If our aim was to find the class, then we will have used softmax所以我们在最后一层使用 sigmoid,输出范围为 0 到 1。如果我们的目标是找到类,那么我们将使用 softmax

So I just want to know if there are any problems if I would to choose the binary_crossentropy like in the following link to predict the outcome class.所以我只想知道是否有任何问题,如果我想选择以下链接中的 binary_crossentropy 来预测结果类。

You confused multilabel and multiclass classification.您混淆了多标签多类分类。

In multiclass , your classifier chooses one class from N other classes.multiclass 中,您的分类器从 N 个其他类中选择一个类。 Usually, the last layer in neural networks that do multiclass classification is a softmax layer.通常,神经网络中进行多类分类的最后一层是 softmax 层。 That means that each output row will sum up to 1 (it forms a probability distribution over these N classes).这意味着每个输出行的总和为 1(它形成了这 N 个类别的概率分布)。

Multilabel classification, on the other hand, consists of making a binary choice for N questions.另一方面,多标签分类包括对 N 个问题进行二元选择。 It makes sense to use binary cross-entropy for that, since the way most neural network framework work makes it behave like you calculate average binary cross-entropy over these binary tasks.为此使用二元交叉熵是有意义的,因为大多数神经网络框架的工作方式使它的行为就像你计算这些二元任务的平均二元交叉熵一样。 In neural networks that are multilabel classifiers, sigmoid is used as the last layer (Kaggle kernel you linked uses sigmoid as activation in the last layer).在作为多标签分类器的神经网络中,sigmoid 用作最后一层(您链接的 Kaggle 内核使用 sigmoid 作为最后一层的激活)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM