简体   繁体   中英

Trying to use VGG16 keras model for audio

I'm trying to use the vgg16 keras model for sound prediction. I'm just changing the last layer for my prediction:

base_model = VGG16(include_top=False,
                  input_shape = (128,431,3),
                  weights = 'imagenet')

        model = Sequential()
        model.add(base_model)
        model.add(GlobalAveragePooling2D())
        model.add(Dense(1024,activation='relu'))
        model.add(Dropout(0.5))
        model.add(Dense(1,activation='sigmoid'))
        model.summary()

The shape of my data is: (128,431,1) (I obtained it with mel_spectrogram from librosa library)

but the Keras model needs (128,431,3)

I tried to use the stack method from numpy but the process stopped. I think this is because there is too much data.

The problem is in using imagenet weights. It requires the number of channels to be 3. When creating the model, set weights to None and it should work.

import keras
base_model = keras.applications.vgg16.VGG16(include_top=False,
                  input_shape = (128,431,1),
                  weights = None)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM