如何为具有keras的4个神经元的输出计算类权重？

Question

I've seen how to do some class weight imbalance correction for a single classification. 我已经看过如何针对单一分类进行一些体重不平衡校正。 But in my case, my output layer is: 但就我而言，我的输出层是：

model.add(Dense(4, activation='sigmoid'))

My target is a DataFrame that has: 我的target是一个DataFrame ，它具有：

       0  1  2  3
0      1  1  0  0
1      0  0  0  0
2      1  1  1  0
3      1  1  0  0
4      1  1  0  0
5      1  1  0  0
6      1  0  0  0
...   .. .. .. ..
14989  1  1  1  1
14990  1  1  1  0
14991  1  1  1  1
14992  1  1  1  0

[14993 rows x 4 columns]

My predictions can take the shape of one of 5 possible values: 我的预测可以采用5种可能值之一的形状：

[[0, 0, 0, 0],
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 1, 1, 0],
[1, 1, 1, 1]]

However, those classes certainly are not balanced. 但是，这些课程当然不平衡。 I've seen how to computer the class weights if I have 1 target output with a softmax , but this is slightly different. 我已经看过如果我有1个带有softmax 目标输出，如何计算类权重，但这略有不同。

Specifically, 特别，

model.fit(..., class_weights=weights)

How can I define weights in this case? 在这种情况下如何定义weights ？

Answer 1

Possible solution 可能解决方案

IMO you should use almost standard categorical_crossentropy and output logits from the network which will be mapped in loss function to values [0,1,2,3,4] using argmax operation (same procedure will be applied to one-hot-encoded labels, see last part of this answer for an example). IMO你应该使用几乎标准的categorical_crossentropy并输出来自网络的logits，它将使用argmax操作在loss函数中映射到值[0,1,2,3,4] （相同的过程将应用于one-hot-encoded标签，看一个例子的答案的最后一部分）。

Using weighted crossentropy you can treat incorrectness differently based on the predicted vs correct values as you said you indicated in the comments. 使用加权crossentropy您可以根据您在评论中指出的predicted vs correct值的不同来区别对待不predicted vs correct 。

All you have to do is to take absolute value of subtracted correct and predicted value and multiply it by loss , see example below: 您所要做的就是获取减去的正确值和预测值的绝对值，并将其乘以损失 ，请参见下面的示例：

Let's map each encoding to it's unary value (can be done using argmax as later seen): 让我们将每个编码映射到它的一元值（可以使用argmax完成，如下所示）：

[0, 0, 0, 0] -> 0
[1, 0, 0, 0] -> 1
[1, 1, 0, 0] -> 2
[1, 1, 1, 0] -> 3
[1, 1, 1, 1] -> 4

And let's make some random targets and predictions by the model to see the essence: 让我们通过模型制作一些随机目标和预测，看看其本质：

   correct  predicted with Softmax
0        0                       4
1        4                       3
2        3                       3
3        1                       4
4        3                       1
5        1                       0

Now, when you subtract correct and predicted and take absolute you essentially get weighting column like this: 现在，当你减去correct和predicted并采取绝对时，你基本上得到这样的加权列：

As you can see, prediction of 0 while true target is 4 will be weighted 4 times more than prediction of 3 with the same 4 target and that is what you want essentially IIUC. 正如您所看到的，预测0而真实目标是4将比使用相同4目标的3预测加权4倍，这就是您想要的基本上IIUC。

As Daniel Möller indicates in his answer I would advise you to create a custom loss function as well but a little simpler: 正如DanielMöller在他的回答中指出的那样，我建议你创建一个自定义丢失功能，但更简单一点：

import tensorflow as tf

# Output logits from your network, not the values after softmax activation
def weighted_crossentropy(labels, logits):
    return tf.losses.softmax_cross_entropy(
        labels,
        logits,
        weights=tf.abs(tf.argmax(logits, axis=1) - tf.argmax(labels, axis=1)),
    )

And you should use this loss in your model.compile as well, I think there is no need to reiterate points already made. 你应该在你的model.compile使用这个损失，我认为没有必要重申已经提出的观点。

Disadvantages of this solution: 这个解决方案的缺点：

For correct predictions gradient will be equal to zero, which means it will be harder for network to strengthen connections (maximize/minimize logits towards +inf/-inf ) 对于正确的预测，梯度将等于零，这意味着网络将更难加强连接（最大化/最小化对+inf/-inf ）
Above can be mitigated by adding random noise (additional regularization) to each weighted loss. 通过向每个加权损失添加随机噪声（附加正则化）可以减轻上述情况。 Would act as a regularization as well, might help. 也可以作为正规化，可能有所帮助。
Better solution might be to exclude from weighting case where predictions are equal (or make it 1), it would not add randomization to network optimization. 更好的解决方案可能是排除预测相等（或使其为1）的加权情况，它不会将随机化添加到网络优化中。

Advantages of this solution: 此解决方案的优点：

You can easily add weighting for imbalanced dataset (eg certain classes ocuring more often) 您可以轻松地为不平衡数据集添加权重（例如，某些类更频繁地出现）
Maps cleanly to existing API 完全映射到现有API
Simple conceptually and remains in classification realm 简单概念并保持在分类领域
Your model cannot predict nonexistent classification values, eg with your multitarget case it could predict [1, 0, 1, 0] , there is no such with approach above. 您的模型无法预测不存在的分类值，例如，您可以预测[1, 0, 1, 0] 1,0,1,0]的多目标情况，上面没有这种方法。 Less degree of freedom would help it train and remove chances for nonsensical (if I got your problem description right) predictions. 较低的自由度将有助于培养和消除无意义的机会（如果我的问题描述正确）预测。

Additional discussion provided in the chat room in comments 聊天室在评论中提供了额外的讨论

Example network with custom loss 自定义丢失的示例网络

Here is an example network with the custom loss function defined above. 以下是具有上面定义的自定义丢失功能的示例网络。 Your labels have to be one-hot-encoded in order for it to work correctly. 您的标签必须是one-hot-encoded才能正常工作。

import keras    
import numpy as np
import tensorflow as tf

# You could actually make it a lambda function as well
def weighted_crossentropy(labels, logits):
    return tf.losses.softmax_cross_entropy(
        labels,
        logits,
        weights=tf.abs(tf.argmax(logits, axis=1) - tf.argmax(labels, axis=1)),
    )


model = keras.models.Sequential(
    [
        keras.layers.Dense(32, input_shape=(10,)),
        keras.layers.Activation("relu"),
        keras.layers.Dense(10),
        keras.layers.Activation("relu"),
        keras.layers.Dense(5),
    ]
)

data = np.random.random((32, 10))
labels = keras.utils.to_categorical(np.random.randint(5, size=(32, 1)))

model.compile(optimizer="rmsprop", loss=weighted_crossentropy)
model.fit(data, labels, batch_size=32)

Answer 2

(Removed) First, you should fix your one-hot encoding: （删除）首先，你应该修复你的单热编码：

(Removed) pd.get_dummies(target) （删除）pd.get_dummies（目标）

Calculate each class weight by summing the amount of np.unique(target) and divide by target.shape[0] , getting proportions: 通过将np.unique(target)的数量相加并除以target.shape[0]计算每个类的权重，得到比例：

target=np.array([0 0 0 0], [1 0 0 0], [1 1 0 0], [1 1 1 0], [1 1 1 1])

proportion=[]
for i in range(0,len(target)):
    proportion.append([i,len(np.where(target==np.unique(target)[i])[0])/target.shape[0]])

class_weight = dict(proportion)


model.fit(..., class_weights=class_weight)

Answer 3

Considering you have your targets (ground truth y) with shape (samples, 4) , you can simply: 考虑到你有你的目标（基本事实y）与形状(samples, 4) ，你可以简单地：

positives = targetsAsNumpy.sum(axis=0)
totals = len(targetsAsNumpy)

negativeWeights = positives / totals
positiveWeights = 1 - negativeWeights

The class weights in the fit method are meant for categorical problems (only one correct class). 拟合方法中的类权重是针对分类问题（仅一个正确的类）。

I suggest you create a custom loss with these. 我建议你用这些创造一个自定义的损失。 Supposing you are using binary_crossentropy . 假设您正在使用binary_crossentropy 。

import keras.backend as K

posWeightsK = K.constant(positiveWeights.reshape((1,4)))
negWeightsK = K.constant(negativeWeights.reshape((1,4)))

def weightedLoss(yTrue, yPred):

    loss = K.binary_crossentropy(yTrue, yPred)
    loss = K.switch(K.greater(yTrue, 0.5), loss * posWeigthsK, loss *  negWeightsK)
    return K.mean(loss) #optionally K.mean(loss, axis=-1) for further customization

Use this loss in the model: 在模型中使用此损失：

model.compile(loss = weightedLoss, ...)

Answer 4

Per-neuron errors 每神经元错误

For this value encoding (unary, also called 'thermometer code') you can simply measure the error on each value separately and add them, using eg binary_crossentropy or even mean squared / mean absolute error metric. 对于此值编码（一元，也称为“温度计代码”），您可以单独测量每个值的误差并添加它们，例如使用binary_crossentropy或甚至均方/平均绝对误差度量。 Given this output it's not really a classification problem, it's a discrete representation of a regression task; 鉴于此输出，它不是真正的分类问题，它是回归任务的离散表示; but such representations are effective in certain cases - eg as the paper Thermometer Encoding: One Hot Way To Resist Adversarial Examples describes. 但是这种表示在某些情况下是有效的 - 例如，纸张温度计编码：一种抵抗对抗性示例的热门方法。

While such separate error measurements doesn't ensure that 'invalid' outputs (eg [1 0 0 0 1]) are impossible, they'll be very unlikely for any well-fit network, and it does have the property that, if the correct value is [1 1 1 1 0] then a prediction of [1 1 0 0 0] is "twice as wrong" as a prediction of [1 1 1 0 0]. 虽然这种单独的错误测量不能确保“无效”输出（例如[1 0 0 0 1 1]）是不可能的，但它们对于任何适合的网络来说都是不太可能的，并且它确实具有如果正确值是[1 1 1 1 0]然后[1 1 0 0 0]的预测是“错误的两倍”而是[1 1 1 0 0]的预测。 And you don't need to adjust the 'class weights' to achieve these results. 而且您无需调整“类权重”即可实现这些结果。

如何为具有keras的4个神经元的输出计算类权重？

问题描述

4 个解决方案

解决方案1
2 已采纳 2019-03-11 12:40:45

Possible solution 可能解决方案

Disadvantages of this solution: 这个解决方案的缺点：

Advantages of this solution: 此解决方案的优点：

Example network with custom loss 自定义丢失的示例网络

解决方案2
1 2019-03-08 19:52:31

解决方案3
1 2019-03-11 11:18:28

解决方案4
0 2019-03-08 19:27:28

Per-neuron errors 每神经元错误

如何为具有keras的4个神经元的输出计算类权重？

问题描述

4 个解决方案

解决方案1 2 已采纳 2019-03-11 12:40:45

Possible solution 可能解决方案

Disadvantages of this solution: 这个解决方案的缺点：

Advantages of this solution: 此解决方案的优点：

Example network with custom loss 自定义丢失的示例网络

解决方案2 1 2019-03-08 19:52:31

解决方案3 1 2019-03-11 11:18:28

解决方案4 0 2019-03-08 19:27:28

Per-neuron errors 每神经元错误

解决方案1
2 已采纳 2019-03-11 12:40:45

解决方案2
1 2019-03-08 19:52:31

解决方案3
1 2019-03-11 11:18:28

解决方案4
0 2019-03-08 19:27:28