Trying to use VGG16 keras model for audio

Question

I'm trying to use the vgg16 keras model for sound prediction. I'm just changing the last layer for my prediction:

base_model = VGG16(include_top=False,
                  input_shape = (128,431,3),
                  weights = 'imagenet')

        model = Sequential()
        model.add(base_model)
        model.add(GlobalAveragePooling2D())
        model.add(Dense(1024,activation='relu'))
        model.add(Dropout(0.5))
        model.add(Dense(1,activation='sigmoid'))
        model.summary()

The shape of my data is: (128,431,1) (I obtained it with mel_spectrogram from librosa library)

but the Keras model needs (128,431,3)

I tried to use the stack method from numpy but the process stopped. I think this is because there is too much data.

Answer 1

The problem is in using imagenet weights. It requires the number of channels to be 3. When creating the model, set weights to None and it should work.

import keras
base_model = keras.applications.vgg16.VGG16(include_top=False,
                  input_shape = (128,431,1),
                  weights = None)

Trying to use VGG16 keras model for audio

Question

1 answers

solution1
0 ACCPTED 2020-06-22 12:55:51

Trying to use VGG16 keras model for audio

Question

1 answers

solution1 0 ACCPTED 2020-06-22 12:55:51

solution1
0 ACCPTED 2020-06-22 12:55:51