简体   繁体   English

为卷积神经网络中的不平衡数据集添加类权重

[英]Adding Class Weights for imbalanced dataset in Convolutional Neural Network

I have a dataset of images that has the following distribution:我有一个具有以下分布的图像数据集:

  • Class 0: 73,5% 0 级:73.5%
  • Class 1: 7%第一类:7%
  • Class 2: 15%第 2 类:15%
  • Class 3: 2,5%第 3 类:2.5%
  • Class 4: 2%第 4 类:2%

I think I need to add Class Weights to make up for the low amount of images in class 1, 2, 3 and 4.我想我需要添加类别权重来弥补类别 1、2、3 和 4 中的少量图像。

I have tried calculating the class weights by dividing class 0 with class 1, class 0 with class 2 and so forth.我尝试通过将 0 类除以 1 类,将 0 类除以 2 类等来计算类权重。

I'm assuming that class 0 corresponds to 1, as it doesnt need to be scaled?我假设类 0 对应于 1,因为它不需要缩放? Not sure if that is correct though.不确定这是否正确。

class_weights = np.array([1, 10.5, 4.9, 29.4, 36.75]) 

and added them to my fit function:并将它们添加到我的拟合函数中:

model.fit(x_train, y_train, batch_size=batch_size, class_weight=class_weights, epochs=epochs, validation_data=(x_test, y_test))

I'm unsure if I have calculated the weights correctly, and if this is even how it is supposed to be done?我不确定我是否正确计算了权重,是否应该这样做?

Hopefully anyone can help clarifying it.希望任何人都可以帮助澄清它。

First of all make sure to pass a dictionary since the class_weights parameter takes a dictionary.首先确保传递字典,因为class_weights参数采用字典。

Second, the point of weighting the classes is as follows.其次,对类进行加权的要点如下。 Lets say that you have a binary classification problem where class_1 has 1000 instances and class_2 100 instances.假设您有一个二元分类问题,其中class_1有 1000 个实例, class_2有 100 个实例。 Since you wanna make up for the imbalanced data you can set the weights as:由于您想弥补不平衡的数据,您可以将权重设置为:

class_weights={"class_1": 1, "class_2": 10}

In other words, this would mean that if the model makes a mistake where the true label is class_2 it is going to be penalized 10 times more than if it makes a mistake on a sample where the true class is class_1 .换句话说,这意味着如果模型在真实标签为class_2的地方犯了错误,那么它所受到的惩罚是在真实类别为class_1的样本上犯错误的 10 倍。 You want to have something like this because given the class distribution in the data, the model will have an inherent tendency of overfitting on the class_1 since it is overpopulated by default.你想要这样的东西是因为给定数据中的类分布,模型将有一种固有的过度拟合class_1的趋势,因为它在默认情况下人口过多。 By setting the class weights you are imposing an implicit constraint on the model that it is equally bad to make a wrong prediction on 10 instances of the class_1 and 1 wrong prediction on an instance of the class_2 .通过设置类权重,您对模型施加了隐式约束,即对class_1的 10 个实例进行错误预测和对class_2的实例进行 1 个错误预测同样糟糕。

With that said, you can set the class_weights anyhow you want meaning that there is no right or wrong way to do it.话虽如此,您可以随心所欲地设置class_weights ,这意味着没有正确或错误的方法。 The way you set the weights seems reasonable to me.你设置权重的方式对我来说似乎是合理的。

Please visit this answer for a proper solution https://datascience.stackexchange.com/a/18722请访问此答案以获得正确的解决方案https://datascience.stackexchange.com/a/18722

I understand that you are trying to set class weights, but also consider image augmentation to generate more images for the underrepresented classes.我知道您正在尝试设置类别权重,但也考虑图像增强以为代表性不足的类别生成更多图像。

I solved the problem, thank you so much gorjan.我解决了这个问题,非常感谢 gorjan。

class_weight = {0: 1.0,
            1: 10.5,
            2: 4.8,
            3: 29.5,
            4: 36.4}

Instead of typing for example "0" or "1" around classname, it was without the "" that did the trick:-) and to use the dict as you suggested instead of the np array.不是在类名周围输入例如“0”或“1”,而是没有“”的技巧:-)并按照您的建议使用字典而不是 np 数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM