简体   繁体   English

Keras的卷积神经网络输入形状

[英]Convolutional Neural Network Input Shape with Keras

I have 32760 audio spectrums calculated of dimensions = 72 (# frames) x 40 (# frequency bands) that I am trying to feed into a "wide" convolutional neural network (first layer is an ensemble of 4 different conv layers). 我有32760个音频频谱,它们的尺寸= 72(#个帧)×40(#个频带),我试图将它们馈送到“宽”卷积神经网络中(第一层是4个不同的conv层的集合)。 These spectrums have no depth, therefore they can be represented as a 72 x 40 2D numpy array of floats, and thus the X input to the classifier is an array 32760 elements long, each element being one of these 72 x 40 x 1 spectrums. 这些频谱没有深度,因此可以表示为72 x 40 2D numpy浮点数组,因此输入到分类器的X数组长32760个元素,每个元素都是这些72 x 40 x 1频谱之一。 The Y input is an array of labels, one-hot encoded, with 32760 elements. Y输入是带有32760个元素的一键编码的标签数组。

When trying to fit the CNN using 当尝试使用时适合CNN

model.fit(mono_X, mono_Y, epochs=10, batch_size=None, verbose=2)

I get the following error: 我收到以下错误:

ValueError when checking input: expected input_47 to have 4 dimensions, but got array with shape (32760, 1)

Below is the architecture of my CNN: 以下是我的CNN的体系结构:

spectra = Input(shape=(72, 40, 1)) 

# conv1a
c1a = Conv2D(48, (3,5), activation='relu', padding = 'same')(spectra)
c1a = BatchNormalization()(c1a)
c1a = MaxPooling2D(pool_size=(5, 5), strides = 1)(c1a)
# conv1b
c1b = Conv2D(32, (3,9), activation='relu', padding = 'same')(spectra)
c1b = BatchNormalization()(c1b)
c1b = MaxPooling2D(pool_size=(5, 5), strides = 1)(c1b)
# conv1c
c1c = Conv2D(16, (3,15), activation='relu', padding = 'same')(spectra)
c1c = BatchNormalization()(c1c)
c1c = MaxPooling2D(pool_size=(5, 5), strides = 1)(c1c)
# conv1d
c1d = Conv2D(16, (3,21), activation='relu', padding = 'same')(spectra)
c1d = BatchNormalization()(c1d)
c1d = MaxPooling2D(pool_size=(5, 5), strides = 1)(c1d)

# stack the layers
merged = keras.layers.concatenate([c1a, c1b, c1c, c1d], axis=3)

# conv2
c2 = Conv2D(224, (5,5), activation='relu')(merged)
c2 = BatchNormalization()(c2)
c2 = MaxPooling2D(pool_size=(5, 5), strides = 1)(c2)

# output softmax
out = Dense(15, activation='softmax')(c2)

# create Model
model = Model(spectra, out)

# apply optimization and loss function
adam = Adam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=adam,
            loss='categorical_crossentropy',
            metrics=['accuracy'])

However, if I try to change the input shape to 32760x1 I receive the following error: 但是,如果我尝试将输入形状更改为32760x1,则会收到以下错误:

ValueError: Input 0 is incompatible with layer conv2d_203: expected ndim=4, found ndim=3

What am I doing wrong here? 我在这里做错了什么? Is there a better way to represent my input data? 有没有更好的方法来表示我的输入数据? I have already tried using a pandas DataFrame where each row represents one of the spectra and a myriad of other combinations. 我已经尝试过使用pandas DataFrame,其中每一行代表一个光谱以及无数其他组合。 Using Python 3.6.5 with Keras 2.1.3 with TensorFlow 1.1.0 on the backend. 在后端将TensorFlow 1.1.0和Keras 2.1.3与Python 3.6.5结合使用。

This is my first CNN, I have only implemented ANNs previously using Keras so there may be a very obvious mistake I'm making. 这是我的第一个CNN,我以前仅使用Keras来实现ANN,所以我可能犯了一个非常明显的错误。 Any help appreciated! 任何帮助表示赞赏!


Update! 更新! Taking the advice from @enumaris, using data_format=channels_last as param on the input layer and adding a Flatten() layer between the last Conv2D and the softmax output layer fixed the latter value error. 采纳@enumaris的建议,在输入层上使用data_format=channels_last作为参数,并在最后一个Conv2D和softmax输出层之间添加Flatten()层,以修复后者的值错误。 Now I've come to realize that my training data mono_X is of the wrong shape. 现在,我意识到我的训练数据mono_X的形状错误。 The expected input shape if I'm not mistaken should be (#samples, H, W, #channels). 如果我没记错的话 ,预期的输入形状应该是(#samples,H,W,#channels)。 mono_X is of shape (32760,) while mono_X[0] is of shape (72, 40). mono_X的形状为( mono_X[0] ),而mono_X[0]的形状为( mono_X[0] )。 Using numpy's reshape doesn't seem to be able to unpack these nested arrays. 使用numpy的重塑似乎无法解压缩这些嵌套数组。 How can I properly prepare the input tensor? 如何正确准备输入张量?

The input shape is (72, 40, 1) but you say that elements of mono_X have a shape (72, 40). 输入形状为( mono_X ),但是您说mono_X元素具有形状( mono_X )。 It needs to be reshaped, possibly when preparing the training data like mono_X = mono_X.reshape(-1, 72, 40, 1) . 需要重整形,可能是在准备训练数据时,例如mono_X = mono_X.reshape(-1, 72, 40, 1) This assumes that mono_X is a numpy array of shape (#samples, 72, 40), but for some reason it sounds like you have a numpy array of numpy arrays. 这假设mono_X是一个形状为numpy的数组(#samples,72、40),但是由于某些原因,这听起来像您有一个numpy数组的numpy数组。

You can also reshape the Keras layer like this: 您也可以像这样重塑Keras层:

spectra = Input(shape=(72, 40))
spectra = Reshape((72, 40, 1))(spectra)

Personally, I would reshape before training and not in the model to avoid any additional overhead in training. 就个人而言,我会在训练之前而不是在模型中进行重塑,以避免训练中的任何额外开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM