简体   繁体   中英

How to use data augmentation on uneven multiclass dataset?

I have 12 classes(images) and uneven distributed data in each of these classes.

They are as follows(all images):

X1 = 16

X2 = 203

X3 = 192

X4 = 220

X5 = 172

X6 = 143

X7 = 22

X8 = 89

X9 = 31

X10 = 89

X11 = 10

X12 = 204

I am trying to train a CNN using the given datset. I want to know whether should I apply data augmentation to only the classes having less data or to all of the classes? Has anyone trained a similar model as mine? Also, what architecture of CNN should I use? I have used this(by applying data augmentation to all classes), but I stopped since the accuracy was around 14%(I stopped in between the first epoch)

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape)) # input_shape = (150,150)
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(12))
model.add(Activation('sigmoid'))

Any help would be appreciated. If anyone has any tips, I would like to hear some. It's giving me a hard time lately.

You have 12 classes in your data of totally 1391 images. Your most frequent class is X4 with 220 images (=15.8% of total data). 15.8 % accuracy is you baseline score which you should beat. You stopped training early, you should train for some epochs and see how it is going.

You have only 1391 images and data augmentation is unavoidable. You can experimentize with augmentation on all classes and you can then try adding class weights to see if score gets better.

you can fill in a class_weight dictionary and fit your model with it:

class_weight = {0 : 1,    1: 1,    2: 5, ....}
model.fit(X_train, Y_train, nb_epoch=5, batch_size=32, class_weight=class_weight)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM