Python Neural Networks - Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (700, 128, 33)

So I am working on a "music genre classification" project and I am working with the GTZAN dataset to create a simple CNN.network to classify the genre for an audio file.

My code for the model training, validation and testing is below:

input_shape = (genre_features.train_X.shape[1], genre_features.train_X.shape[2],1)
print("Build CNN model ...")
model = Sequential()

model.add(Conv2D(24, (5, 5), strides=(1, 1), input_shape=input_shape))
model.add(AveragePooling2D((2, 2), strides=(2,2)))

model.add(Conv2D(48, (5, 5), padding="same"))
model.add(AveragePooling2D((2, 2), strides=(2,2)))

model.add(Conv2D(48, (5, 5), padding="same"))
model.add(AveragePooling2D((2, 2), strides=(2,2)))



print("Compiling ...")
opt = Adam()
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

print("Training ...")
batch_size = 35  # num of training examples per minibatch
num_epochs = 400

print("\nValidating ...")
score, accuracy = model.evaluate(
    genre_features.dev_X, genre_features.dev_Y, batch_size=batch_size, verbose=1
print("Dev loss:  ", score)
print("Dev accuracy:  ", accuracy)

print("\nTesting ...")
score, accuracy = model.evaluate(
    genre_features.test_X, genre_features.test_Y, batch_size=batch_size, verbose=1
print("Test loss:  ", score)
print("Test accuracy:  ", accuracy)

# Creates a HDF5 file 'lstm_genre_classifier.h5'
model_filename = "lstm_genre_classifier_lstm.h5"
print("\nSaving model: " + model_filename)

And when I try to train the file I get the following Error ( I also printed the Train, Validation and Test Shape before compiling model)

Training X shape: (700, 128, 33)
Training Y shape: (700, 10)
Dev X shape: (200, 128, 33)
Dev Y shape: (200, 10)
Test X shape: (100, 128, 33)
Test Y shape: (100, 10)
Build CNN model ...
2020-12-25 15:46:58.410663: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Compiling ...
Model: "sequential_1"
Layer (type)                 Output Shape              Param #
conv2d_1 (Conv2D)            (None, 124, 29, 24)       624
average_pooling2d_1 (Average (None, 62, 14, 24)        0
activation_1 (Activation)    (None, 62, 14, 24)        0
conv2d_2 (Conv2D)            (None, 62, 14, 48)        28848
average_pooling2d_2 (Average (None, 31, 7, 48)         0
activation_2 (Activation)    (None, 31, 7, 48)         0
conv2d_3 (Conv2D)            (None, 31, 7, 48)         57648
average_pooling2d_3 (Average (None, 15, 3, 48)         0
activation_3 (Activation)    (None, 15, 3, 48)         0
flatten_1 (Flatten)          (None, 2160)              0
dropout_1 (Dropout)          (None, 2160)              0
dense_1 (Dense)              (None, 64)                138304
activation_4 (Activation)    (None, 64)                0
dropout_2 (Dropout)          (None, 64)                0
dense_2 (Dense)              (None, 10)                650
activation_5 (Activation)    (None, 10)                0
Total params: 226,074
Trainable params: 226,074
Non-trainable params: 0
Training ...
Traceback (most recent call last):
  File "cnn.py", line 82, in <module>
  File "C:\Users\Bharat.000\miniconda3\lib\site-packages\keras\engine\training.py", line 1154, in fit
  File "C:\Users\Bharat.000\miniconda3\lib\site-packages\keras\engine\training.py", line 579, in _standardize_user_data
  File "C:\Users\Bharat.000\miniconda3\lib\site-packages\keras\engine\training_utils.py", line 135, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (700, 128, 33)

I tried few solutions from some similar questions, but I could not understood much since I am new to this topic. Any help apprecitated about what do I change to get proper output.

Your input dimension is wrong. Are you sure your data is 2D (like images) and not 1D (like sound waves)? If your data is 1D then you should be doing 1 dimensional convolutions. The reason why an error occurs is because your train data has the shape (700 (how many datapoints), 128, 33). In Conv2D in keras you need to have (batch size, image_height, image_width, channels) -- channels could be first or last but its not really relevant. What I am trying to say is that instead of the (image_height, image_width) tuple required by 2Dconv you only provide the number 128. Maybe what you're looking for is 1 Dimensional conv.

