[英]Adding Class Weights for imbalanced dataset in Convolutional Neural Network
I have a dataset of images that has the following distribution:我有一个具有以下分布的图像数据集:
I think I need to add Class Weights to make up for the low amount of images in class 1, 2, 3 and 4.我想我需要添加类别权重来弥补类别 1、2、3 和 4 中的少量图像。
I have tried calculating the class weights by dividing class 0 with class 1, class 0 with class 2 and so forth.我尝试通过将 0 类除以 1 类,将 0 类除以 2 类等来计算类权重。
I'm assuming that class 0 corresponds to 1, as it doesnt need to be scaled?我假设类 0 对应于 1,因为它不需要缩放? Not sure if that is correct though.不确定这是否正确。
class_weights = np.array([1, 10.5, 4.9, 29.4, 36.75])
and added them to my fit function:并将它们添加到我的拟合函数中:
model.fit(x_train, y_train, batch_size=batch_size, class_weight=class_weights, epochs=epochs, validation_data=(x_test, y_test))
I'm unsure if I have calculated the weights correctly, and if this is even how it is supposed to be done?我不确定我是否正确计算了权重,是否应该这样做?
Hopefully anyone can help clarifying it.希望任何人都可以帮助澄清它。
First of all make sure to pass a dictionary since the class_weights
parameter takes a dictionary.首先确保传递字典,因为class_weights
参数采用字典。
Second, the point of weighting the classes is as follows.其次,对类进行加权的要点如下。 Lets say that you have a binary classification problem where class_1
has 1000 instances and class_2
100 instances.假设您有一个二元分类问题,其中class_1
有 1000 个实例, class_2
有 100 个实例。 Since you wanna make up for the imbalanced data you can set the weights as:由于您想弥补不平衡的数据,您可以将权重设置为:
class_weights={"class_1": 1, "class_2": 10}
In other words, this would mean that if the model makes a mistake where the true label is class_2
it is going to be penalized 10 times more than if it makes a mistake on a sample where the true class is class_1
.换句话说,这意味着如果模型在真实标签为class_2
的地方犯了错误,那么它所受到的惩罚是在真实类别为class_1
的样本上犯错误的 10 倍。 You want to have something like this because given the class distribution in the data, the model will have an inherent tendency of overfitting on the class_1
since it is overpopulated by default.你想要这样的东西是因为给定数据中的类分布,模型将有一种固有的过度拟合class_1
的趋势,因为它在默认情况下人口过多。 By setting the class weights you are imposing an implicit constraint on the model that it is equally bad to make a wrong prediction on 10 instances of the class_1
and 1 wrong prediction on an instance of the class_2
.通过设置类权重,您对模型施加了隐式约束,即对class_1
的 10 个实例进行错误预测和对class_2
的实例进行 1 个错误预测同样糟糕。
With that said, you can set the class_weights
anyhow you want meaning that there is no right or wrong way to do it.话虽如此,您可以随心所欲地设置class_weights
,这意味着没有正确或错误的方法。 The way you set the weights seems reasonable to me.你设置权重的方式对我来说似乎是合理的。
Please visit this answer for a proper solution https://datascience.stackexchange.com/a/18722请访问此答案以获得正确的解决方案https://datascience.stackexchange.com/a/18722
I understand that you are trying to set class weights, but also consider image augmentation to generate more images for the underrepresented classes.我知道您正在尝试设置类别权重,但也考虑图像增强以为代表性不足的类别生成更多图像。
I solved the problem, thank you so much gorjan.我解决了这个问题,非常感谢 gorjan。
class_weight = {0: 1.0,
1: 10.5,
2: 4.8,
3: 29.5,
4: 36.4}
Instead of typing for example "0" or "1" around classname, it was without the "" that did the trick:-) and to use the dict as you suggested instead of the np array.不是在类名周围输入例如“0”或“1”,而是没有“”的技巧:-)并按照您的建议使用字典而不是 np 数组。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.