简体   繁体   English

如何避免tensorflow keras中的停滞损失函数

[英]How to avoid stagnating loss function in tensorflow keras

I've been stuck on this for over a week now.我已经坚持了一个多星期了。 I'm making a network that is supposed to estimate wavefront modes from a Schack-Hartmann sensor.我正在制作一个应该从 Schack-Hartmann 传感器估计波前模式的网络。 Something very specific, basically is sees a bunch of dots in a 254x254 picture and has to estimate 64 parameters centered around 0 from it.一些非常具体的东西,基本上是在 254x254 的图片中看到一堆点,并且必须从中估计以 0 为中心的 64 个参数。

My networks look like:我的网络看起来像:

model = Sequential()
model.add(Conv2D (32,(5, 5),activation="relu",input_shape=[256,256,1]))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Conv2D (32,(5, 5),activation="relu"))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Conv2D (64,(3, 3),activation="relu"))
model.add(Conv2D (64,(3, 3),activation="relu"))
model.add(Conv2D (64,(3, 3),activation="relu"))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Flatten())
model.add(Dense (512, activation="relu" ))
model.add(Dropout(0.02))
model.add(Dense (512, activation="relu" ))
model.add(Dropout(0.02))
model.add(Dense(y_train.shape[1], activation="linear"))

The loss function decreases for a few iterations and then stagnates with an accuracy around 0.09.损失函数减少了几次迭代,然后以 0.09 左右的准确度停滞不前。 To fix it I have tried changing the architecture, changing loss functions, changing activation functions, normalizing the data in different ways and changing the batch size.为了修复它,我尝试更改架构、更改损失函数、更改激活函数、以不同方式规范化数据并更改批量大小。 Nothing helps.没有任何帮助。

Does anyone have an idea of what I can try?有谁知道我可以尝试什么?

This seems like a really hard problem: Take some input in the form of an image with dots and translate it to a vector of 64 parameters.这似乎是一个非常困难的问题:以带点的图像形式获取一些输入,并将其转换为具有 64 个参数的向量。 I'm not surprised that the network isn't doing very well.我对网络表现不佳并不感到惊讶。 Also, I'm assuming the 64 parameters are real numbers in the range -1 to 1 (because of your statement "the network... has to estimate 64 parameters centered around 0 from it").另外,我假设 64 个参数是 -1 到 1 范围内的实数(因为您的陈述“网络......必须从中估计以 0 为中心的 64 个参数”)。 If this is the case, it's not a classification problem, and accuracy doesn't make sense as a metric.如果是这种情况,则它不是分类问题,并且准确性作为度量没有意义。 Try monitoring RMSE or AE instead.尝试改为监视 RMSE 或 AE。 In a similar vein, make sure the labels are what you want.同样,确保标签是您想要的。 Do you want to classify the wavefronts present in the image into different classes or do you want to assign real-valued parameters for each wavefront in the image?您想将图像中存在的波前分类为不同的类别,还是想为图像中的每个波前分配实值参数? In the former, labels would be one-hot vectors, and in the latter labels would be 64 real numbers.在前者中,标签将是单热向量,而在后者中,标签将是 64 个实数。

A few things you can try:您可以尝试以下几点:

  1. First, remove dropout and train on a small subset of the data to make sure the model can overfit the small subset (ie 100% accuracy or 0.0 RMSE).首先,删除 dropout 并在数据的一个小子集上进行训练,以确保模型可以过拟合小子集(即 100% 准确度或 0.0 RMSE)。 If the model can't do this, you'll need to try a different approach (possibly a non-cnn approach).如果模型无法做到这一点,您将需要尝试不同的方法(可能是非 cnn 方法)。

  2. Scale your targets to real numbers between 0 and 1, and change the output activation to a sigmoid.将目标缩放为 0 到 1 之间的实数,并将输出激活更改为 sigmoid。 This might give the network a better starting point, as its outputs have to be between 0 and 1. Using a linear activation for the last layer gives the network the freedom to predict any real number, and increases the real estate that the NN has to search over to make a good prediction.这可能会给网络一个更好的起点,因为它的输出必须介于 0 和 1 之间。 最后一层使用线性激活使网络可以自由地预测任何实数,并增加了 NN 必须的空间搜索以做出良好的预测。

  3. Correspondingly, use binary crossentropy as a loss function with a sigmoid output and see how the model does.相应地,使用二元交叉熵作为带有 sigmoid 输出的损失函数,看看模型是如何做的。

  4. You could see if a model could predict a single element of the length-64 vector, and train a single model for each parameter.您可以查看模型是否可以预测长度为 64 的向量的单个元素,并为每个参数训练一个模型。 This would be expensive computationally, but if it works it works.这在计算上会很昂贵,但如果它有效,它就会起作用。

Good luck.祝你好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM