简体   繁体   中英

How to avoid stagnating loss function in tensorflow keras

I've been stuck on this for over a week now. I'm making a network that is supposed to estimate wavefront modes from a Schack-Hartmann sensor. Something very specific, basically is sees a bunch of dots in a 254x254 picture and has to estimate 64 parameters centered around 0 from it.

My networks look like:

model = Sequential()
model.add(Conv2D (32,(5, 5),activation="relu",input_shape=[256,256,1]))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Conv2D (32,(5, 5),activation="relu"))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Conv2D (64,(3, 3),activation="relu"))
model.add(Conv2D (64,(3, 3),activation="relu"))
model.add(Conv2D (64,(3, 3),activation="relu"))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Flatten())
model.add(Dense (512, activation="relu" ))
model.add(Dropout(0.02))
model.add(Dense (512, activation="relu" ))
model.add(Dropout(0.02))
model.add(Dense(y_train.shape[1], activation="linear"))

The loss function decreases for a few iterations and then stagnates with an accuracy around 0.09. To fix it I have tried changing the architecture, changing loss functions, changing activation functions, normalizing the data in different ways and changing the batch size. Nothing helps.

Does anyone have an idea of what I can try?

This seems like a really hard problem: Take some input in the form of an image with dots and translate it to a vector of 64 parameters. I'm not surprised that the network isn't doing very well. Also, I'm assuming the 64 parameters are real numbers in the range -1 to 1 (because of your statement "the network... has to estimate 64 parameters centered around 0 from it"). If this is the case, it's not a classification problem, and accuracy doesn't make sense as a metric. Try monitoring RMSE or AE instead. In a similar vein, make sure the labels are what you want. Do you want to classify the wavefronts present in the image into different classes or do you want to assign real-valued parameters for each wavefront in the image? In the former, labels would be one-hot vectors, and in the latter labels would be 64 real numbers.

A few things you can try:

  1. First, remove dropout and train on a small subset of the data to make sure the model can overfit the small subset (ie 100% accuracy or 0.0 RMSE). If the model can't do this, you'll need to try a different approach (possibly a non-cnn approach).

  2. Scale your targets to real numbers between 0 and 1, and change the output activation to a sigmoid. This might give the network a better starting point, as its outputs have to be between 0 and 1. Using a linear activation for the last layer gives the network the freedom to predict any real number, and increases the real estate that the NN has to search over to make a good prediction.

  3. Correspondingly, use binary crossentropy as a loss function with a sigmoid output and see how the model does.

  4. You could see if a model could predict a single element of the length-64 vector, and train a single model for each parameter. This would be expensive computationally, but if it works it works.

Good luck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM