简体   繁体   中英

Increase val_acc in the audio classification

I have 530 data points belonging to 10 classes. I am not sure which numbers should I use for the num_rows and num_columns .

In this code I have num_rows = 40 , num_columns = 174 :

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=2, input_shape=(num_rows, num_columns, num_channels), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Conv2D(filters=64, kernel_size=2, kernel_regularizer=l2(0.00001), bias_regularizer=l2(0.0001), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Conv2D(filters=128, kernel_size=2, kernel_regularizer=l2(0.00001), bias_regularizer=l2(0.0001), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

model.add(Conv2D(filters=128, kernel_size=2, kernel_regularizer=l2(0.00001), bias_regularizer=l2(0.0001),  activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
#model.add(GlobalAveragePooling2D())

model.add(Flatten())
model.add(Dense(512, activation='relu'))
#model.add(Dropout(0.2))
model.add(Dense(256, activation='relu'))
#model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))

model.add(Dense(10, activation='softmax'))
# Compile the model
#opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', metrics=\['accuracy'\], optimizer="Adam")

模型损失

I am guessing you have some sort of spectrograms on your input (since you're working with audio, but have 3-dimensional shape on input). Your input_shape has to reflect the size of images that you pass on input. Simply check their width and height - these are your num_rows and num_columns .

According to that code, the images have 3 colour bands. That makes sense for photos, but rarely for spectrograms. Remember these are false colours that typically are generated to create visually-pleasing visualisations, but don't get you anything when doing classification. Single channel is enough, the pixel intensity reflects strength (amplitude) of the signal.

Three simple things you can do:

  • Use monochromatic images, eg input_shape=(num_rows, num_columns, 1) . Colour only confuses the classifier.
  • Get more data and use augmentation.
  • kernel_size=2 makes little sense. Read on convolutions first and what are the kernels.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM