简体   繁体   中英

Class weights worsen my keras classification model

I have a model with that is to classify some data, and it has a target output of 21. It uses the adam optimizer and categorical cross-entropy loss. In an attempt to improve the model loss, I did a visualization of the class frequencies in the data set and found that the top 2 classes have a frequency of about 25,000 and 20,000 while the lowest 2 are about 4, 40. with the other classes ranging from 100, 2000. I realized this is a stark difference in values and attempted to add in-class weights which I extracted using sklearn like so:

My y array is in the one-hot encoding style, something like:

class1,   class2, class3, class4 ... class21
   0        0       1       0    ...    0 
   1        0       0       0    ...    0
   0        1       0       0    ...    0
from sklearn.utils.class_weight import compute_class_weight

y_int = np.argmax(y.to_numpy(), axis=1)
weights = compute_class_weight('balanced', classes=np.unique(y_int), y=y_int)
di = dict(enumerate(class_weights))

but my loss worsened and I started getting loss values in the range of 30 - 50, by the 50th epoch. which is horrible when compared to the fact that without the class weights I was getting about 0.4.

Is there something wrong with the way I extracted the class weights? or should I not be using class weights entirely? If not, what should I be using to account for this huge imbalance? -Thanks

Keras prints the weighted loss during training; you can confirm that by, eg, doubling all the class weights. So the larger loss for the weighted model may just suggest that the smaller classes are more difficult to classify, and now that you're focusing the loss's attention on those smaller classes you see worse scores.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM