简体   繁体   English

CNN 仅针对 binary_crossentropy 损失 function 收敛并在测试数据集上失败

[英]CNN converging only for binary_crossentropy loss function and failing on the test dataset

The problem consists in training a CNN with a data set containing an input of (n,n) shape matrices of numbers between 0 and 1000. The output should represent the very same (n,n) shape matrices having the 0 values patched with other values between 1 and 1000. (Similar with denoising images)问题在于用一个数据集训练一个 CNN,该数据集包含 0 到 1000 之间数字的 (n,n) 形状矩阵的输入。output 应该表示非常相同的 (n,n) 形状矩阵,其 0 值与其他值在 1 到 1000 之间。(与去噪图像类似)

Example:
Input:             Output: 
135 0   283 4      135 75  283 4
25  38  43  0      25  38  43  815
0   99  820 481    533 99  820 481
728 331 3   28     728 331 3   28

The outputs were generated by running the inputs through a genetic algorithm for optimizing an external cost function.输出是通过遗传算法运行输入以优化外部成本 function 生成的。 (this function is quite complex and computed in an external dedicated program) The entire data set contains 3000 Inputs and Outputs, each having a (10,10) shape. (这个 function 相当复杂,并在外部专用程序中计算)整个数据集包含 3000 个输入和输出,每个输入和输出具有 (10,10) 形状。

As the problem resembles similarities with image processing neural networks, I decided to use a CNN.由于这个问题与图像处理神经网络相似,我决定使用 CNN。 X_data contains the values of each input matrix and Y_data contains binary encodings (10 decimal long) of the output patched data. X_data包含每个输入矩阵的值, Y_data包含 output 修补数据的二进制编码(十进制长 10)。 (worked out this encoding in order to reduce output size and improve convergence speed) (制定此编码以减少 output 大小并提高收敛速度)

Example:
X_data:            Y_data:
135 0   283 4      0 0 1 0 0 0 0 1 1 1 0 0 0 1 0 0 1 0 1 1...
25  38  43  0
0   99  820 481
728 331 3   28

The network is converging only when using the 'binary_crossentropy' loss function, but when I compare the performance of the trained CNN over new inputs, there is no improvement.网络仅在使用“binary_crossentropy”损失 function 时才会收敛,但是当我比较经过训练的 CNN 在新输入上的性能时,没有任何改进。 (I measure this by comparing the CNN output with the genetic algorithm optimized output over the same external cost function) (我通过将 CNN output 与遗传算法优化 output 在相同的外部成本函数上进行比较来衡量这一点)

The questions: Are the training input and output data sets compatible with this kind of problem?问题:训练输入和 output 数据集是否与此类问题兼容? Is the data set large enough to obtain proper results or should provide more training data?数据集是否足够大以获得正确的结果,还是应该提供更多的训练数据? Is the encoding of the output data a good approach, or is it the problem that`s making the trained CNN not work? output 数据的编码是一种好方法,还是导致训练后的 CNN 不起作用的问题? If there are any other approach flaws, please help me work them out!如果还有其他方法缺陷,请帮我解决!

inshape = (dim,dim,1)
outshape = (dim*dim*10)

model = Sequential()     
inp = Input(inshape) 
x = BatchNormalization()(inp)
x = Conv2D(1000, kernel_size=3,padding='same')(x)
x = Activation('relu')(x)    
x = Reshape((100,100,-1))(x)
x = MaxPooling2D(pool_size=(5,5))(x)
x = SpatialDropout2D(rate = 0.5)(x, training=True)    
x = Conv2D(250, kernel_size=3,padding='same')(x)
x = Activation('relu')(x)
x = MaxPooling2D(pool_size=(4,4))(x)    
x = SpatialDropout2D(rate = 0.3)(x, training=True)
x = Conv2D(400, kernel_size=3,padding='same')(x) 
x = Activation('sigmoid')(x)
x = MaxPooling2D(pool_size=(2,2))(x) 
x = Conv2D(2500, kernel_size=3,padding='same')(x) 
x = Activation('sigmoid')(x)
x = MaxPooling2D(pool_size=(2,2))(x)  
x = BatchNormalization()(x)
x = Flatten()(x)
out = Dense(outshape, activation='sigmoid',name='out')(x)
model = Model(inputs=inp,outputs=(out)) 
model.compile(optimizer=Adadelta(lr=1.0),
              loss = ['binary_crossentropy'], metrics =['accuracy'])

First of all, do you really need a CNN, or just fully connected layers?首先,你真的需要 CNN,还是只需要全连接层? To answer this question, ask yourself if there is a spatial relationship between the values of your input data, like in images.要回答这个问题,请问问自己输入数据的值之间是否存在空间关系,例如图像。 Now for your questions:现在为您的问题:

Are the training input and output data sets compatible with this kind of problem?训练输入和output数据集是否兼容此类问题?

Yes it is.是的。 What you need is called a denoizing autoencoder (CNN or fully connected, it depends on your data).您需要的称为去噪自动编码器(CNN 或全连接,这取决于您的数据)。 Autoencoders are applying successively two functions on x, such that f(g(x)) = x', and x and x' have the same size.自动编码器在 x 上连续应用两个函数,使得 f(g(x)) = x',并且 x 和 x' 具有相同的大小。 To stop the autoencoder from learning the identity matrix, layers in the middle are significantly smaller.为了阻止自动编码器学习单位矩阵,中间的层要小得多。 Denoizing is a variant, because zeros in the input array are considered as noise, and you want to reconstruct the missing information in the output layer.去噪是一种变体,因为输入数组中的零被认为是噪声,并且您想要重建 output 层中的缺失信息。

Is the data set large enough to obtain proper results or do I need more data?数据集是否足够大以获得正确的结果,还是我需要更多数据?

It depends on the complexity of the function that relates predictors to outcomes, as well as the entropy of your data.这取决于 function 的复杂性,它将预测变量与结果以及数据的熵相关联。 Try to modify the width and depth of your layers to see if it helps.尝试修改图层的宽度和深度,看看是否有帮助。

Is the encoding of the output data a good approach, or is it the problem that`s making the trained CNN not work? output 数据的编码是一种好方法,还是导致训练后的 CNN 不起作用的问题?

One-hot encoding is useful when you are dealing with a classification task, and not a regression task.当您处理分类任务而不是回归任务时,One-hot 编码很有用。 From what I understand what you have here is a regression task, even if you're trying to output integers.据我了解,您在这里拥有的是回归任务,即使您正在尝试 output 整数。 You could simply try to reconstruct the input x on the output layer with a matrix of real numbers, and eventually round them for computing your loss, which could be a simple RMSE (binary cross entropy is for classification tasks).您可以简单地尝试使用实数矩阵重建 output 层上的输入 x,并最终将它们四舍五入以计算您的损失,这可能是一个简单的 RMSE(二进制交叉熵用于分类任务)。

Now, making neural networks to work is complex, there is a lot of different factors, and knowledge alone is not enough.现在,让神经网络工作很复杂,有很多不同的因素,仅靠知识是不够的。 You also need experience, trials and errors, etc. I recommend you to start simple, get satisfactory results, and then try to improve your prediction accuracy step by step.你也需要经验、试错等等。我建议你从简单做起,得到满意的结果,然后逐步尝试提高你的预测准确率。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 keras的binary_crossentropy损失函数范围 - keras's binary_crossentropy loss function range 使用 softmax 作为 output function 而使用 binary_crossentropy 作为损失 function? - Using softmax as output function while using binary_crossentropy as loss function? 大型网络的Keras binary_crossentropy代价函数系统误差 - Keras binary_crossentropy cost function systematic error with large networks python 神经网络损失 = 'categorical_crossentropy' vs 'binary_crossentropy' isse - python neural network loss = 'categorical_crossentropy' vs 'binary_crossentropy' isse 用于多类分类交叉熵损失函数的 Keras CNN - Keras CNN for multiclass categorical crossentropy loss function 一个形状为 (687, 809) 的目标数组被传递给形状 (None, 25) 的输出,同时使用作为损失`binary_crossentropy - A target array with shape (687, 809) was passed for an output of shape (None, 25) while using as loss `binary_crossentropy ValueError:无法挤压昏暗[1],'{{node binary_crossentropy/weighted_loss/Squeeze}} 的维度应为 1 - ValueError: Can not squeeze dim[1], expected a dimension of 1 for '{{node binary_crossentropy/weighted_loss/Squeeze}} 形状为 (6400, 1) 的目标数组被传递给形状为 (None, 2) 的 output,同时用作损失`binary_crossentropy` - A target array with shape (6400, 1) was passed for an output of shape (None, 2) while using as loss `binary_crossentropy` 形状为 (64, 4) 的目标数组被传递给形状为 (None, 3) 的 output,同时用作损失`binary_crossentropy` - A target array with shape (64, 4) was passed for an output of shape (None, 3) while using as loss `binary_crossentropy` Keras binary_crossentropy model 总是预测 0 - Keras binary_crossentropy model always predicts 0
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM