MLP for speech recognition

I am trying to learn speech recognition and so I am using a simple MLP for starters.

Below is the code:

#Simple MLP model

num_labels = Y.shape[1]
filter_size = 2

# Construct model 
model = Sequential()

model.add(Dense(256, input_shape=(32,)))



# Compile the model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

# Display model architecture summary 

# Calculate pre-training accuracy 

score = model.evaluate(X_test, Y_test, verbose=2)
accuracy = 100*score[1]

print("Pre-training accuracy: %.4f%%" % accuracy)

I am using MFCC for feature extraction and MLB for one-hot encoding.

The shape of X_train,X_val,X_test,Y_train,Y_val and Y_test is as follows: (54296, 99, 32) (6787, 99, 32) (6788, 99, 32) (54296, 31) (6787, 31) (6788, 31)

I am getting following errors:

  1. WARNING:tensorflow:Model was constructed with shape (None, 32) for input Tensor("dense_21_input:0", shape=(None, 32), dtype=float32), but it was called on an input with incompatible shape (None, 99, 32).

When I change the input_shape to (99,32,) the warning disappears. Can anybody explain me the reason?

  1. ValueError: Shapes (None, 31) and (None, 99, 31) are incompatible (This one is when I try to calculate the pre-training accuracy)

I have no idea on how to deal with this error?

I look forward to receiving some help.


In this particular line, model.add(Dense(256, input_shape=(32,))) , you define the input shape to be (32, ) which means the shape is going to be of the form (batch_size, 32) which isn't really the case, because your inputs are of the shape (batch_size, 99, 32) so that's the reason you need the specification like so: input_shape = (99, 32, )

About the pre-training accuracy part, I'm not really sure because you mention the shape of X_test to be (6788, 99, 32) , the last dimension being 32, while your Y_test has the shape (6788, 31) , the last dimension being 31.
That doesn't quite add up, the error in your code says Shapes (None, 31) and (None, 99, 31) , the (None, 99, 31) part is inconsistent with the shape mentioned earlier.

