为什么我的CNN回归网络无法学习？

Question

I am running a Convolutional Neural Network of regressive type. 我正在运行回归类型的卷积神经网络。 This network takes a 55x1756 image and outputs another image of dimensions 11x1756. 该网络获取55x1756的图像，并输出尺寸为11x1756的另一个图像。 For this reason the last layer of my architecture (shown below) consists of a dense layer which has as argument the output dimensions multiplied together. 因此，我的体系结构的最后一层（如下所示）由一个密集层组成，该层具有作为参数的输出尺寸相乘在一起。

As shown below, I am using "tanh" activation function and "adam" as optimizer. 如下所示，我正在使用“ tanh”激活功能和“ adam”作为优化程序。 I have been training the network for some time now but the result is pretty much always the same. 我已经培训网络一段时间了，但是结果几乎总是一样。 The loss remains steady as well as the root mean squared error, in addition to the fact that the validation loss is lower than the training loss which is not ideal. 除了验证损失低于不理想的训练损失外，损失还保持稳定以及均方根误差。 Attached below are both the training schematics and the model summary. 下面随附的是训练原理图和模型摘要。

Do you have any suggestion on how I could improve it? 您对我的改进方法有何建议？ Thanks in advance! 提前致谢！

def generator(data_arr, batch_size = 10):

    num = len(data_arr) 
    num = int(num/batch_size)

    # Loop forever so the generator never terminates
    while True: 

        for offset in range(0, num):

            batch_samples = (data_arr[offset*batch_size:(offset+1)*batch_size])

            samples = []
            labels = []

            for batch_sample in batch_samples:

                samples.append(batch_sample[0])
                labels.append((np.array(batch_sample[1].flatten())).transpose())

            X_ = np.array(samples)
            Y_ = np.array(labels)

            X_ = X_[:, :, :, newaxis]

            yield (X_, Y_)

    # compile and train the model using the generator function
    train_generator = generator(training_data, batch_size = 10)
    validation_generator = generator(val_data, batch_size = 10)

    model = Sequential()

    model.add(Conv2D(4, (2, 2), input_shape = (55, 1756, 1)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size = (3, 3)))
    model.add(BatchNormalization())

    model.add(Conv2D(8, (2, 2)))
    model.add(Activation('tanh'))
    model.add(MaxPooling2D(pool_size = (3, 3)))
    model.add(BatchNormalization())

    model.add(Conv2D(16, (2, 2)))
    model.add(Activation('tanh'))
    model.add(MaxPooling2D(pool_size = (3, 3)))
    model.add(BatchNormalization())

    model.add(Flatten()) 
    model.add(Dense(19316))
    model.add(Activation('softmax'))

    def nrmse(y_true, y_pred):
        return backend.sqrt(backend.mean(backend.square(y_pred - 
            y_true)))/(2)

    def rmse(y_true, y_pred):
        return backend.sqrt(backend.mean(backend.square(y_pred - y_true)))

    model.compile(loss = 'mean_squared_error',
                  optimizer = 'adam',
                  metrics = [rmse, nrmse, 'mae'])

    model.summary()

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 27, 878, 4)        20        
_________________________________________________________________
activation_1 (Activation)    (None, 27, 878, 4)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 9, 292, 4)         0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 9, 292, 4)         16        
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 8, 291, 8)         136       
_________________________________________________________________
activation_2 (Activation)    (None, 8, 291, 8)         0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 97, 8)          0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 2, 97, 8)          32        
_________________________________________________________________
flatten_1 (Flatten)          (None, 1552)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 19316)             29997748  
_________________________________________________________________
activation_3 (Activation)    (None, 19316)             0

=================================================================
Total params: 29,997,952
Trainable params: 29,997,928
Non-trainable params: 24
_________________________________________________________________

Epoch 1/6
6660/6660 [==============================] - 425s 64ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0333 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 2/6
6660/6660 [==============================] - 422s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 3/6
6660/6660 [==============================] - 422s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 4/6
6660/6660 [==============================] - 422s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 5/6
6660/6660 [==============================] - 422s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.0327
Epoch 6/6
6660/6660 [==============================] - 421s 63ms/step - loss: 0.0135 - rmse: 0.0986 - nrmse: 0.0577 - mean_absolute_error: 0.0332 - val_loss: 0.0133 - val_rmse: 0.0971 - val_nrmse: 0.0572 - val_mean_absolute_error: 0.03274

Answer 1

It could be vanishing gradient problem which happens if you employ activation functions other than ReLu. 如果您使用除ReLu之外的其他激活功能，则可能会消失梯度问题。 Try changing functions to ReLu and then see whether it improves or not. 尝试将功能更改为ReLu，然后查看其是否有所改善。

为什么我的CNN回归网络无法学习？

问题描述

1 个解决方案

解决方案1
1 2019-07-30 09:28:39

为什么我的CNN回归网络无法学习？

问题描述

1 个解决方案

解决方案1 1 2019-07-30 09:28:39

解决方案1
1 2019-07-30 09:28:39